krisp

Introduction

Call quality isn’t just about how clearly the agent speaks — it’s also about how clearly the agent can hear. In contact centers, where efficiency and accuracy drive performance, background noise from the customer side can have a major impact on agent productivity and experience. Customers often call in from noisy environments — traffic, households, public spaces — introducing acoustic clutter that leads to repetition, frustration, and longer handle times.

Traditional noise cancellation solutions have focused primarily on the outbound audio channel, removing noise from the agent’s side before it reaches the customer. But that only solves half the problem. Inbound noise cancellation — removing distractions from the customer side before it reaches the agent or the AI — is often just as important.

At Krisp, we’ve long recognized the importance of customer-side audio cleanup, and we’ve been solving it at scale for years. Our vision for inbound noise cancellation as a key enabler for better agent experiences is detailed in this article, where we highlight how noisy customer audio affects handle times, comprehension, and overall call quality.

Today, Krisp’s mature, production-grade inbound noise cancellation models power real-world applications:

  • Krisp AI Meeting Assistant — deployed for years in Krisp’s desktop app, helping professionals clearly hear their remote counterparts during online meetings — even when the other side is calling from a noisy café, home, or airport.
  • Krisp AI Contact Center — used by BPOs and customer support teams to clean up customer voices in live calls, boosting agent comprehension.
  • Krisp SDK — starting in March 2025, Krisp’s inbound Noise and Voice Cancellation technology became available through our SDKs for seamless integration with server-side, real-time voice AI systems. Today, Krisp powers some of the largest production Voice Bots, helping them solve critical challenges like turn-taking accuracy, background noise robustness, and ASR performance in real-world environments.

Sanas entered the market with an outbound noise cancellation solution, generally available in August, 2024. On May 30, 2025, Sanas announced a new omnidirectional noise cancellation model, claiming support for both inbound and outbound audio cleanup, including real-time customer-side voice processing.

Given the importance of inbound noise cancellation, we decided to put this new offering to the test.

Just as we conducted an in-depth comparison of Sanas vs. Krisp for outbound noise cancellation, we ran a technical evaluation of Sanas’s inbound noise cancellation solution against Krisp’s production-grade models, focusing on real-world call center scenarios, voice quality, and effectiveness in handling background speech.

Understanding the Differences Between Inbound and Outbound Noise Cancellation

While both inbound and outbound noise cancellation (NC) aim to improve voice clarity, the conditions they operate under are fundamentally different. Constraints make inbound NC a technically more complex and demanding task, and not all noise cancellation models are designed to handle it effectively.

Aspect Outbound Noise Cancellation Inbound Noise Cancellation
Audio Source Agent’s local microphone Customer audio received over network
Audio Quality High-fidelity, uncompressed Compressed, degraded audio (e.g., VoIP, PSTN)
Typical Sample Rate 32 kHz 8 kHz or 16 kHz
Use Cases Improving how the customer hears the agent Improving how the agent or AI hears the customer
Speaker Scenarios Typically single-speaker Single or multi-speaker (e.g., speakerphones, conference rooms)

With these fundamental differences between inbound and outbound noise cancellation in mind, we evaluated how Krisp and Sanas approach the inbound side of the problem.

 

Operational Differences: Krisp vs. Sanas

While both Krisp and Sanas aim to improve customer voice clarity for agents, their architectural choices, product maturity, and performance under real-world conditions vary significantly.

The table below summarizes the key differences between Krisp’s and Sanas’s inbound noise cancellation solutions based on our analysis.

Aspect Krisp Sanas
Model Design Use-case optimized models tailored for different inbound Noise Cancellation situations Single, multi-purpose model for both inbound and outbound NC
Audio Quality Up to 16 kHz Up to 8 kHz
Use Case Coverage krisp-viva-v6-lite – integrated world-class Voice Isolation technology. General-purpose AI model for WebRTC, mobile, and telephony (up to 16kHz), resilient to codec artifacts (e.g., G.711)

krisp-nc-i-v8-pro – multi-speaker model optimized for 16kHz far-field use cases like conference rooms

Single, omnichannel AI model used across all conditions
Production Maturity Mature, production-grade models used across enterprise, SDK, and desktop Inbound noise cancellation announced in May 2025; production readiness unverified
Deployment through SDK Available Unknown

Note: All observations regarding Sanas’s inbound noise cancellation performance are based on publicly available Sanas version 3.2.72, conducted in July 2025.

Krisp vs. Sanas: In-depth inbound noise cancellation evaluation

In this section, we present a comparative summary of evaluations conducted on Krisp and Sanas inbound noise cancellation technologies. These evaluations reflect real-world usage scenarios and benchmark data commonly produced by enterprise customers and BPOs assessing solution fit and performance.

We cover comparison methodology, present objective evaluation results, crowdsourced subjective evaluation results, and share comparative audio samples.

Evaluation Methodology and Metrics

For quantitative evaluation, we used the POLQA (Perceptual Objective Listening Quality Analysis) metric — an industry-standard objective metric for predicting perceived listening quality. POLQA is suitable for evaluating narrowband and wideband speech affected by noise, compression artifacts, and signal degradation.

We also processed the outputs using Meta’s AudioBox Aesthetics model, which is a reference-free ML-based model quantitatively assessing listening experience and quality of audios. While not a direct replacement for human perception, it adds a complementary viewpoint to our analysis.

Objective Metric Interpretation Highly Correlated to Subjective Metric What It Captures
POLQA Higher is better Speech Intelligibility & MOS Fidelity and clarity under real-world network conditions; penalizes distortion and noise artifacts in the speech
Production Quality Higher is better Speech Clarity Fidelity, presence of audio artifacts, balance, and clarity of the output signal
Content Enjoyment Higher is better Natural Speech Perceived naturalness, fluidity, and enjoyment of listening — akin to human listening satisfaction

In addition to the objective metrics, a subjective crowdsourced evaluation was conducted, where participants were asked to compare anonymized paired audio samples (e.g., Sanas vs. Krisp) and asked, “Which audio sounds more pleasant and clear?”.

Evaluated Models

To ensure a fair comparison, we focused our primary benchmark on single-speaker inbound noise cancellation scenarios, since Sanas’s model appears to perform some level of secondary background speech suppression — suggesting a form of voice isolation. As such, we compared it directly with Krisp’s Background Voice Cancellation (BVC) enabled inbound model, which is also optimized for single-speaker voice isolation.

However, to offer a more comprehensive view of Krisp’s capabilities, we also included Krisp’s multi-speaker inbound model in the evaluation. This demonstrates how Krisp performs in far-field environments such as speakerphones, group calls, where multiple speakers talk from a distance away from the microphone.

Model Sampling Rate Speaker Scenario Voice Isolation Near Field/Far Field
krisp-viva-v6-lite up to 16 kHz Single Speaker Yes Near Field
krisp-nc-i-v8-pro up to 16 kHz Multi Speaker No Far Field
sanas-inbound up to 8 kHz Single Speaker* Limited Both

this assumption is based on the performance for cases with background speech.

Evaluation Dataset

We created a controlled test dataset by mixing English utterances from the ITU-T P.501 dataset with 24 different real-world background noises at 0dB, 5dB, and 10dB SNR levels. To simulate realistic telephony transmission conditions, we applied common voice codecs — G.729, G.711, and OPUS — before feeding the degraded audio into each model.

Note: Krisp natively produces higher-quality audio at 16kHz sampling rate. For head-to-head comparison, though, we standardized the evaluation pipeline by downsampling Krisp’s output to 8 kHz, matching Sanas’s maximum supported sample rate. This ensured a fair reference test dataset and alignment for POLQA and other narrowband evaluations.

Evaluation Results

The following table summarizes subjective and objective evaluation of Krisp vs. Sanas across key metrics.

Here, the original audio was mixed with various noise types and processed using the Krisp and Sanas models. For a fair comparison, the Krisp model’s output was downsampled to 8 kHz to enable direct comparison with Sanas.

Metric Type Krisp Sanas Winner
POLQA: Home noise Objective ✅ 3.7/5 ❌ 3.1/5 Krisp
POLQA: Street noise Objective ✅ 3.7/5 ❌ 3.1/5 Krisp
POLQA: Cafe noise Objective ✅ 3.8/5 ❌ 2.9/5 Krisp
POLQA: Distractor noise Objective ✅ 3.8/5 ❌ 3.3/5 Krisp
Meta Audiobox: Content Enjoyment Objective ✅ 4.7/10 ❌ 3.8/10 Krisp
Meta Audiobox: Production Quality Objective ✅ 5.2/10 ❌ 3.9/10 Krisp
Which audio sounds more pleasant and clear? Preferred by (# votes / total responses) Subjective 704/960 256/960 Krisp

The following sections provide a deeper comparison of Krisp’s inbound models, evaluated at both 8 kHz and 16 kHz output resolutions, highlighting how sampling rate and model specialization impact voice quality, noise suppression, and the listener’s experience.

Objective Evaluation – POLQA

Key Takeaways

  1. Krisp krisp-viva-v6-lite model consistently outperforms all other models, delivering the highest POLQA score across all four noise environments.
    • It provides an average improvement of +0.59 POLQA points over Sanas, and +0.38 over Krisp’s model with multi-speaker support.
  2. Sanas’s inbound model shows gains over the original noisy audio (avg. +1.48 points), but lags behind Krisp in every scenario:
    • In café noise, krisp-viva-v6-lite is ahead by a very significant +0.87 POLQA points.
    • In distractor noise, where competing speech overlaps with the target voice, the krisp-viva-v6-lite model outperforms Sanas by a significant +0.47 points — highlighting the effectiveness of Krisp’s dedicated voice isolation design.
  3. Krisp krisp-nc-i-v8-pro model performs on par with krisp-viva-v6-lite in ambient noise conditions (home, street, café), with <0.2 difference — but drops sharply in distractor noise (scoring 2.53 vs. 3.82 for krisp-viva-v6-lite), confirming it’s not tuned for background voice suppression.

 

Objective Evaluation – Meta Audiobox Aesthetics

In this evaluation, we compared Krisp’s best-performing inbound model, krisp-viva-v6-lite, at both 8 kHz and 16 kHz output levels, against Sanas’ inbound model, which supports only 8 kHz output. To ensure a fair comparison, we downsampled Krisp’s output to 8 kHz when required.

ℹ️ Note: These objective metrics measure on a 1-10 scale. Even studio-quality recordings with rich prosody and zero background noise typically score just under 9 in our experiments. As such, a delta of 0.3–0.5 points between models represents a meaningful difference in perceived speech quality.

Key Takeaways

  1. Krisp krisp-viva-v6-lite model leads in both subjective quality metrics.
    • Krisp at 16 kHz significantly outpaces all other variants — especially Sanas, which trails at 3.79 and 3.94, respectively. This represents a margin of +1.5 to +2.7 points, a substantial gap.
    • Krisp even at 8 kHz, retains its edge. When downsampled to 8 kHz to match Sanas’ max output rate, Krisp still delivers +0.87 higher Content Enjoyment and +1.3 higher Production Quality.
  2. Sanas struggles with perceived listening quality
    • Sanas’s lower scores indicate noticeably reduced speech fidelity and listener enjoyment.
    • Sanas’s output actually scores lower than the original audio

💡Interestingly, Sanas’s inbound noise cancellation output scores lower than the original noisy audio in both metrics — particularly in Production Quality (3.94 vs. 5.04). This can be explained by the fact that while the model removes the background noise, it actually introduces audible artifacts or residual noise, which degrade the overall listening experience. These issues are clearly perceptible in the sample audio, even with low-quality built-in speakers, but especially with USB headsets agents typically use.

Subjective Evaluation – Crowdsourced A/B testing

We processed 24 noisy audio samples using both krisp-viva-v6-lite and sanas-inbound, then submitted them for evaluation.

  • Each audio pair was compared 40 times, resulting in a total of 960 votes.
  • Listeners were asked: “Which audio sounds more pleasant and clear?”
  • For fair comparison, a downsampled 8kHz version of krisp-viva-v6-lite model’s outputs was used for comparison.
  • To further eliminate bias, all branding information has been removed from the file name and other metadata.

Here are the results:

krisp-viva-v6-lite – 704 votes

sanas-inbound – 256 votes

 

Comparative Audios

🎧 Pro Tip: For the best listening experience, we recommend using USB or wired headphones to clearly pick up subtle audio artifacts.

 

Cafe noise

Original

krisp-viva-v6-lite at 8khz

krisp-viva-v6-lite at 16khz

Sanas-inbound at 8khz

Street noise

Original

krisp-viva-v6-lite at 8khz

krisp-viva-v6-lite at 16khz

Sanas-inbound at 8khz

Distractor noise

Original


krisp-viva-v6-lite at 8khz

krisp-viva-v6-lite at 16khz

Sanas-inbound at 8khz

Home noise

Original

krisp-viva-v6-lite at 8khz

krisp-viva-v6-lite at 16khz

Sanas-inbound at 8khz

Conclusion

Across both objective metrics (like POLQA and Meta Audiobox Aesthetics) and crowdsourced subjective A/B testing, Krisp consistently delivered better speech clarity, fewer audio artifacts, and a more natural listening experience. In fact, Krisp’s model outperformed Sanas in every evaluated scenario, including those with challenging noise types like background speech and telephony degradation.

If you need reliability, voice quality, and real-world performance that scales across your teams and customers — Krisp is the clear and proven choice.

Related Articles