Introducing 3.5x smaller Voice Isolation model with exceptional accuracy

Nov 24, 2025

Written by Krisp Engineering Team

Overview
Operational Details
POLQA Evaluation Setup
Integration into the Krisp VIVA Package

#1 AI Voice Models

Voice Isolation
Turn-Taking
AI Accent Conversion
Noise Cancellation
Background Voice Cancellation

Get Access

Spread the word

0:00

1.0x

Overview

We are excited to announce the release of a new lightweight Voice Isolation model (krisp-viva-tel-lite-v1), the result of continuous innovation from the research team at Krisp. The model is available in Krisp’s VIVA SDK.

It’s designed to improve turn-taking for Voice AI Agents in challenging environments with background noise and other distractions, while running efficiently on CPUs.

Compared to its predecessor, krisp-bvc-o-lite-v2, it delivers significantly better background noise and secondary voice suppression, adds support for inbound telephony use cases, and does so within the same CPU footprint.

Despite being 3.5x smaller, it delivers comparable performance to larger models in Krisp’s industry-leading Voice Isolation models for most use-cases at a much lower CPU computational cost. For example, it’s a viable alternative to krisp-viva-tel-v1 in Krisp VIVA SDK, the default choice for world-class Voice Isolation in Conversational AI use-cases.

In the following sections, we benchmarked the krisp-viva-tel-lite-v1 model against various models from the Krisp family of Noise cancellation and Voice Isolation models:

krisp-bvc-o-lite-v2 – its predecessor, the lightweight Voice Isolation model for microphone audio streams.
krisp-bvc-o-v2 – standard Voice Isolation model for microphone audio streams.
krisp-viva-tel-v1 – industry-leading, bi-directional, Voice Isolation model designed for Conversational AI use-cases, such as Voice bots.

Operational Details

The krisp-viva-tel-lite-1 model includes the following capabilities:

Performs background voice and noise cancellation with the primary speaker detected by speaker-to-microphone proximity cues
Processes audio at 16 kHz bandwidth
Achieves 15 ms algorithmic latency
Same size as its predecessor, but also supports inbound voice isolation
Supports narrow-band audio streams, including telephony and inbound call scenarios
Compatible with a wide range of codecs such as G729, G711, G722, OPUS, and others
Maintains acceptable performance on Bluetooth devices, including AirPods and AirPods Max, even with the microphone positioned away from the speaker’s mouth

POLQA Evaluation Setup

POLQA—a gold-standard objective metric for perceived audio quality—was used to evaluate model performance across realistic conditions.

The English dataset contains 72 audio files mixed with different types of noises with a 10dB SNR. For covering phone-call recordings and inbound scenarios, new versions of the same dataset were created – Narrowband, and G729AnnexBA, G711, OPUS covering telephony-type use cases and codec induced degradations.

In the baseline wideband scenario for microphone-side voice isolation, krisp-viva-tel-lite-1 performs on par with the ~2x larger krisp-bvc-o-v2, demonstrating that its reduced size does not come at the cost of perceptual quality.

Across all evaluated datasets, it also consistently outperforms its predecessor krisp-bvc-o-lite-v2, showing clear improvements in clarity, noise handling, and overall robustness. Most notably, the new model exhibits significantly stronger performance in narrow-band and codec-degraded conditions—areas that are especially critical for inbound telephony use cases. Even when the input audio is heavily constrained by codec artifacts or low bandwidth, the model maintains stable and reliable output quality, making it well-suited for real-world telephony environments.

These results translate directly into measurable gains across both outbound and inbound scenarios.

Outbound audio processing: krisp-viva-tel-lite-1 shows 2% improvement over krisp-bvc-o-lite-v2 while remaining comparable to krisp-bvc-o-v2.
Inbound audio processing: the improvements are more visible. The model achieves 5% POLQA improvement over krisp-bvc-o-lite-v2 and 8% over krisp-bvc-o-v2, aligning with the observed robustness under narrow-band and codec-stressed conditions.

Attaching examples of challenging scenarios where the krisp-viva-tel-lite-v1 outperforms it’s predecessor model.

	Original	krisp-viva-tel-lite-v1
Suppression Narrow-band, landline
Leakage Bluetooth AirPod
Distorted bandwidth USB headset

Integration into the Krisp VIVA Package

The model, widely regarded as the best Voice Isolation model for Conversational AI use cases, remains the highest-quality option for difficult inbound scenarios.

Despite its small size krisp-viva-tel-lite-v1 delivers strong quality, performing at a level remarkably close to ~3.5x larger krisp-viva-tel-v1, industry’s default choice for Voice Isolation model for Voice Bots. The larger model continues to lead in situations involving extreme noise or severely degraded bandwidth, but in most inbound audio scenarios, the new smaller model behaves similarly and produces comparable results. These distinctions are also reflected in the POLQA metrics.

Given its high performance and lower CPU usage, it is a viable alternative to the krisp-viva-tel-v1 model for massive server-side deployments to provide clean audio to Voice bots, where CPU is at a premium, and a great choice for running Voice Isolation models on edge devices, in mobile applications, and in environments with strict hardware constraints.

Reference files