Introduction
Call quality isn’t just about how clearly the agent speaks — it’s also about how clearly the agent can hear. In contact centers, where efficiency and accuracy drive performance, background noise from the customer side can have a major impact on agent productivity and experience. Customers often call in from noisy environments — traffic, households, public spaces — introducing acoustic clutter that leads to repetition, frustration, and longer handle times.
Traditional noise cancellation solutions have focused primarily on the outbound audio channel, removing noise from the agent’s side before it reaches the customer. But that only solves half the problem. Inbound noise cancellation — removing distractions from the customer side before it reaches the agent or the AI — is often just as important.
At Krisp, we’ve long recognized the importance of customer-side audio cleanup, and we’ve been solving it at scale for years. Our vision for inbound noise cancellation as a key enabler for better agent experiences is detailed in this article, where we highlight how noisy customer audio affects handle times, comprehension, and overall call quality.
Today, Krisp’s mature, production-grade inbound noise cancellation models power real-world applications:
- Krisp AI Meeting Assistant — deployed for years in Krisp’s desktop app, helping professionals clearly hear their remote counterparts during online meetings — even when the other side is calling from a noisy café, home, or airport.
- Krisp AI Contact Center — used by BPOs and customer support teams to clean up customer voices in live calls, boosting agent comprehension.
- Krisp SDK — starting in March 2025, Krisp’s inbound Noise and Voice Cancellation technology became available through our SDKs for seamless integration with server-side, real-time voice AI systems. Today, Krisp powers some of the largest production Voice Bots, helping them solve critical challenges like turn-taking accuracy, background noise robustness, and ASR performance in real-world environments.
Sanas entered the market with an outbound noise cancellation solution, generally available in August, 2024. On May 30, 2025, Sanas announced a new omnidirectional noise cancellation model, claiming support for both inbound and outbound audio cleanup, including real-time customer-side voice processing.
Given the importance of inbound noise cancellation, we decided to put this new offering to the test.
Just as we conducted an in-depth comparison of Sanas vs. Krisp for outbound noise cancellation, we ran a technical evaluation of Sanas’s inbound noise cancellation solution against Krisp’s production-grade models, focusing on real-world call center scenarios, voice quality, and effectiveness in handling background speech.
Understanding the Differences Between Inbound and Outbound Noise Cancellation
While both inbound and outbound noise cancellation (NC) aim to improve voice clarity, the conditions they operate under are fundamentally different. Constraints make inbound NC a technically more complex and demanding task, and not all noise cancellation models are designed to handle it effectively.
Aspect | Outbound Noise Cancellation | Inbound Noise Cancellation |
---|---|---|
Audio Source | Agent’s local microphone | Customer audio received over network |
Audio Quality | High-fidelity, uncompressed | Compressed, degraded audio (e.g., VoIP, PSTN) |
Typical Sample Rate | 32 kHz | 8 kHz or 16 kHz |
Use Cases | Improving how the customer hears the agent | Improving how the agent or AI hears the customer |
Speaker Scenarios | Typically single-speaker | Single or multi-speaker (e.g., speakerphones, conference rooms) |
With these fundamental differences between inbound and outbound noise cancellation in mind, we evaluated how Krisp and Sanas approach the inbound side of the problem.
Operational Differences: Krisp vs. Sanas
While both Krisp and Sanas aim to improve customer voice clarity for agents, their architectural choices, product maturity, and performance under real-world conditions vary significantly.
The table below summarizes the key differences between Krisp’s and Sanas’s inbound noise cancellation solutions based on our analysis.
Aspect | Krisp | Sanas |
---|---|---|
Model Design | Use-case optimized models tailored for different inbound Noise Cancellation situations | Single, multi-purpose model for both inbound and outbound NC |
Audio Quality | Up to 16 kHz | Up to 8 kHz |
Use Case Coverage | krisp-viva-v6-lite – integrated world-class Voice Isolation technology. General-purpose AI model for WebRTC, mobile, and telephony (up to 16kHz), resilient to codec artifacts (e.g., G.711)
|
Single, omnichannel AI model used across all conditions |
Production Maturity | Mature, production-grade models used across enterprise, SDK, and desktop | Inbound noise cancellation announced in May 2025; production readiness unverified |
Deployment through SDK | Available | Unknown |
Note: All observations regarding Sanas’s inbound noise cancellation performance are based on publicly available Sanas version 3.2.72, conducted in July 2025.
Krisp vs. Sanas: In-depth inbound noise cancellation evaluation
In this section, we present a comparative summary of evaluations conducted on Krisp and Sanas inbound noise cancellation technologies. These evaluations reflect real-world usage scenarios and benchmark data commonly produced by enterprise customers and BPOs assessing solution fit and performance.
We cover comparison methodology, present objective evaluation results, crowdsourced subjective evaluation results, and share comparative audio samples.
Evaluation Methodology and Metrics
For quantitative evaluation, we used the POLQA (Perceptual Objective Listening Quality Analysis) metric — an industry-standard objective metric for predicting perceived listening quality. POLQA is suitable for evaluating narrowband and wideband speech affected by noise, compression artifacts, and signal degradation.
We also processed the outputs using Meta’s AudioBox Aesthetics model, which is a reference-free ML-based model quantitatively assessing listening experience and quality of audios. While not a direct replacement for human perception, it adds a complementary viewpoint to our analysis.
Objective Metric | Interpretation | Highly Correlated to Subjective Metric | What It Captures |
---|---|---|---|
POLQA | Higher is better | Speech Intelligibility & MOS | Fidelity and clarity under real-world network conditions; penalizes distortion and noise artifacts in the speech |
Production Quality | Higher is better | Speech Clarity | Fidelity, presence of audio artifacts, balance, and clarity of the output signal |
Content Enjoyment | Higher is better | Natural Speech | Perceived naturalness, fluidity, and enjoyment of listening — akin to human listening satisfaction |
In addition to the objective metrics, a subjective crowdsourced evaluation was conducted, where participants were asked to compare anonymized paired audio samples (e.g., Sanas vs. Krisp) and asked, “Which audio sounds more pleasant and clear?”.
Evaluated Models
To ensure a fair comparison, we focused our primary benchmark on single-speaker inbound noise cancellation scenarios, since Sanas’s model appears to perform some level of secondary background speech suppression — suggesting a form of voice isolation. As such, we compared it directly with Krisp’s Background Voice Cancellation (BVC) enabled inbound model, which is also optimized for single-speaker voice isolation.
However, to offer a more comprehensive view of Krisp’s capabilities, we also included Krisp’s multi-speaker inbound model in the evaluation. This demonstrates how Krisp performs in far-field environments such as speakerphones, group calls, where multiple speakers talk from a distance away from the microphone.
Model | Sampling Rate | Speaker Scenario | Voice Isolation | Near Field/Far Field |
---|---|---|---|---|
krisp-viva-v6-lite |
up to 16 kHz | Single Speaker | Yes | Near Field |
krisp-nc-i-v8-pro |
up to 16 kHz | Multi Speaker | No | Far Field |
sanas-inbound |
up to 8 kHz | Single Speaker* | Limited | Both |
this assumption is based on the performance for cases with background speech.
Evaluation Dataset
We created a controlled test dataset by mixing English utterances from the ITU-T P.501 dataset with 24 different real-world background noises at 0dB, 5dB, and 10dB SNR levels. To simulate realistic telephony transmission conditions, we applied common voice codecs — G.729, G.711, and OPUS — before feeding the degraded audio into each model.
Note: Krisp natively produces higher-quality audio at 16kHz sampling rate. For head-to-head comparison, though, we standardized the evaluation pipeline by downsampling Krisp’s output to 8 kHz, matching Sanas’s maximum supported sample rate. This ensured a fair reference test dataset and alignment for POLQA and other narrowband evaluations.
Evaluation Results
The following table summarizes subjective and objective evaluation of Krisp vs. Sanas across key metrics.
Here, the original audio was mixed with various noise types and processed using the Krisp and Sanas models. For a fair comparison, the Krisp model’s output was downsampled to 8 kHz to enable direct comparison with Sanas.
Metric | Type | Krisp | Sanas | Winner |
---|---|---|---|---|
POLQA: Home noise | Objective | ✅ 3.7/5 | ❌ 3.1/5 | Krisp |
POLQA: Street noise | Objective | ✅ 3.7/5 | ❌ 3.1/5 | Krisp |
POLQA: Cafe noise | Objective | ✅ 3.8/5 | ❌ 2.9/5 | Krisp |
POLQA: Distractor noise | Objective | ✅ 3.8/5 | ❌ 3.3/5 | Krisp |
Meta Audiobox: Content Enjoyment | Objective | ✅ 4.7/10 | ❌ 3.8/10 | Krisp |
Meta Audiobox: Production Quality | Objective | ✅ 5.2/10 | ❌ 3.9/10 | Krisp |
Which audio sounds more pleasant and clear? Preferred by (# votes / total responses) | Subjective | ✅704/960 | ❌256/960 | Krisp |
The following sections provide a deeper comparison of Krisp’s inbound models, evaluated at both 8 kHz and 16 kHz output resolutions, highlighting how sampling rate and model specialization impact voice quality, noise suppression, and the listener’s experience.
Objective Evaluation – POLQA
Key Takeaways
- Krisp
krisp-viva-v6-lite
model consistently outperforms all other models, delivering the highest POLQA score across all four noise environments.- It provides an average improvement of +0.59 POLQA points over Sanas, and +0.38 over Krisp’s model with multi-speaker support.
- Sanas’s inbound model shows gains over the original noisy audio (avg. +1.48 points), but lags behind Krisp in every scenario:
- In café noise,
krisp-viva-v6-lite
is ahead by a very significant +0.87 POLQA points. - In distractor noise, where competing speech overlaps with the target voice, the
krisp-viva-v6-lite
model outperforms Sanas by a significant +0.47 points — highlighting the effectiveness of Krisp’s dedicated voice isolation design.
- In café noise,
- Krisp
krisp-nc-i-v8-pro
model performs on par withkrisp-viva-v6-lite
in ambient noise conditions (home, street, café), with <0.2 difference — but drops sharply in distractor noise (scoring 2.53 vs. 3.82 forkrisp-viva-v6-lite
), confirming it’s not tuned for background voice suppression.
Objective Evaluation – Meta Audiobox Aesthetics
In this evaluation, we compared Krisp’s best-performing inbound model, krisp-viva-v6-lite
, at both 8 kHz and 16 kHz output levels, against Sanas’ inbound model, which supports only 8 kHz output. To ensure a fair comparison, we downsampled Krisp’s output to 8 kHz when required.
ℹ️ Note: These objective metrics measure on a 1-10 scale. Even studio-quality recordings with rich prosody and zero background noise typically score just under 9 in our experiments. As such, a delta of 0.3–0.5 points between models represents a meaningful difference in perceived speech quality.
Key Takeaways
- Krisp
krisp-viva-v6-lite
model leads in both subjective quality metrics.- Krisp at 16 kHz significantly outpaces all other variants — especially Sanas, which trails at 3.79 and 3.94, respectively. This represents a margin of +1.5 to +2.7 points, a substantial gap.
- Krisp even at 8 kHz, retains its edge. When downsampled to 8 kHz to match Sanas’ max output rate, Krisp still delivers +0.87 higher Content Enjoyment and +1.3 higher Production Quality.
- Sanas struggles with perceived listening quality
- Sanas’s lower scores indicate noticeably reduced speech fidelity and listener enjoyment.
- Sanas’s output actually scores lower than the original audio
💡Interestingly, Sanas’s inbound noise cancellation output scores lower than the original noisy audio in both metrics — particularly in Production Quality (3.94 vs. 5.04). This can be explained by the fact that while the model removes the background noise, it actually introduces audible artifacts or residual noise, which degrade the overall listening experience. These issues are clearly perceptible in the sample audio, even with low-quality built-in speakers, but especially with USB headsets agents typically use.
Subjective Evaluation – Crowdsourced A/B testing
We processed 24 noisy audio samples using both krisp-viva-v6-lite
and sanas-inbound
, then submitted them for evaluation.
- Each audio pair was compared 40 times, resulting in a total of 960 votes.
- Listeners were asked: “Which audio sounds more pleasant and clear?”
- For fair comparison, a downsampled 8kHz version of
krisp-viva-v6-lite
model’s outputs was used for comparison. - To further eliminate bias, all branding information has been removed from the file name and other metadata.
Here are the results:
krisp-viva-v6-lite
– 704 votes
sanas-inbound
– 256 votes
Comparative Audios
🎧 Pro Tip: For the best listening experience, we recommend using USB or wired headphones to clearly pick up subtle audio artifacts.
Cafe noise |
|||
Original |
|
|
Sanas-inbound at 8khz |
Street noise |
|||
Original |
|
|
Sanas-inbound at 8khz |
Distractor noise |
|||
Original |
krisp-viva-v6-lite at 8khz |
|
Sanas-inbound at 8khz |
Home noise |
|||
Original |
|
|
Sanas-inbound at 8khz |
Conclusion
Across both objective metrics (like POLQA and Meta Audiobox Aesthetics) and crowdsourced subjective A/B testing, Krisp consistently delivered better speech clarity, fewer audio artifacts, and a more natural listening experience. In fact, Krisp’s model outperformed Sanas in every evaluated scenario, including those with challenging noise types like background speech and telephony degradation.
If you need reliability, voice quality, and real-world performance that scales across your teams and customers — Krisp is the clear and proven choice.