krisp
0:00
0:00
1.0x

Overview

We are excited to announce the release of a new lightweight Voice Isolation model (krisp-viva-tel-lite-v1), the result of continuous innovation from the research team at Krisp. The model is available in Krisp’s VIVA SDK.

It’s designed to improve turn-taking for Voice AI Agents in challenging environments with background noise and other distractions, while running efficiently on CPUs.

Compared to its predecessor, krisp-bvc-o-lite-v2, it delivers significantly better background noise and secondary voice suppression, adds support for inbound telephony use cases, and does so within the same CPU footprint.

Despite being 3.5x smaller, it delivers comparable performance to larger models in Krisp’s industry-leading Voice Isolation models for most use-cases at a much lower CPU computational cost. For example, it’s a viable alternative to krisp-viva-tel-v1 in Krisp VIVA SDK, the default choice for world-class Voice Isolation in Conversational AI use-cases.

In the following sections, we benchmarked the krisp-viva-tel-lite-v1 model against various models from the Krisp family of Noise cancellation and Voice Isolation models:

  • krisp-bvc-o-lite-v2 – its predecessor, the lightweight Voice Isolation model for microphone audio streams.
  • krisp-bvc-o-v2 – standard Voice Isolation model for microphone audio streams.
  • krisp-viva-tel – industry-leading, bi-directional, Voice Isolation model designed for Conversational AI use-cases, such as Voice bots.

 

Operational Details

The krisp-viva-tel-lite-1 model includes the following capabilities:

  • Performs background voice and noise cancellation with the primary speaker detected by speaker-to-microphone proximity cues
  • Processes audio at 16 kHz bandwidth
  • Achieves 15 ms algorithmic latency
  • Same size as its predecessor, but also supports inbound voice isolation
  • Supports narrow-band audio streams, including telephony and inbound call scenarios
  • Compatible with a wide range of codecs such as G729, G711, G722, OPUS, and others
  • Maintains acceptable performance on Bluetooth devices, including AirPods and AirPods Max, even with the microphone positioned away from the speaker’s mouth

POLQA Evaluation Setup

POLQA—a gold-standard objective metric for perceived audio quality—was used to evaluate model performance across realistic conditions.

The English dataset contains 72 audio files mixed with different types of noises and a 10dB SNR level. For covering phone-call recordings and inbound scenarios, new versions of the same dataset were created – Narrowband, and G729AnnexBA, G711, OPUS.

In the baseline wideband scenario for microphone-side voice isolation, krisp-viva-tel-lite-1 performs on par with the ~2x larger krisp-bvc-o-v2, demonstrating that its reduced size does not come at the cost of perceptual quality.

Across all evaluated datasets, it also consistently outperforms its predecessor krisp-bvc-o-lite-v2, showing clear improvements in clarity, noise handling, and overall robustness. Most notably, the new model exhibits significantly stronger performance in narrow-band and codec-degraded conditions—areas that are especially critical for inbound telephony use cases. Even when the input audio is heavily constrained by codec artifacts or low bandwidth, the model maintains stable and reliable output quality, making it well-suited for real-world telephony environments.

 


Attaching examples of challenging scenarios where the krisp-viva-tel-lite-v1 outperforms it’s predecessor model.

Original krisp-viva-tel-lite-v1
Suppression
Narrow-band, landline

Leakage
Bluetooth AirPod
Distorted bandwidth
USB headset

 

Integration into the Krisp VIVA Package

The model, widely regarded as the best Voice Isolation model for Conversational AI use cases, remains the highest-quality option for difficult inbound scenarios.

Despite its small size krisp-viva-tel-lite-v1 delivers strong quality, performing at a level remarkably close to ~3.5x larger krisp-viva-tel-v1, industry’s default choice for Voice Isolation model for Voice Bots. The larger model continues to lead in situations involving extreme noise or severely degraded bandwidth, but in most inbound audio scenarios, the new smaller model behaves similarly and produces comparable results. These distinctions are also reflected in the POLQA metrics.

Given its high performance and lower CPU usage, it is a viable alternative to the krisp-viva-tel-v1 model for massive server-side deployments to provide clean audio to Voice bots, where CPU is at a premium, and a great choice for running Voice Isolation models on edge devices, in mobile applications, and in environments with strict hardware constraints.

 

Related Articles