How does Voice Isolation differ from noise cancellation?
Noise cancellation removes background sounds. Voice Isolation goes further — it removes both background noise and secondary human voices, ensuring only the primary speaker's voice reaches your VAD or STT pipeline. This eliminates false interruptions caused by nearby speakers or cross-talk.
What's the difference between Turn Prediction and Interruption Prediction?
Turn Prediction identifies when a speaker is about to finish talking, so your AI agent can respond at the right moment without awkward pauses. Interruption Prediction determines whether a user speaking mid-response intends to interrupt or is simply asking a question or giving a backchannel like "uh-huh." Together, they give your agent the conversational awareness to handle real dialogue.
Do VIVA models require transcription or language-specific configuration?
No. All VIVA models operate directly on the audio signal — no transcription step is needed. They are language agnostic and support multiple languages natively, with no per-language tuning or configuration required.
Can VIVA models be used together or independently?
Each model in the VIVA family — Voice Isolation, Turn Prediction, Interruption Prediction, and VAD — works as a standalone component. You can deploy one or combine them depending on your pipeline needs. Most voice AI agent deployments benefit from running them together for the best conversational experience.
What are the deployment requirements?
VIVA models are lightweight and optimized for on-server CPU deployment. They integrate directly into your existing voice pipeline — typically in front of your VAD or STT — and are available via C, Python, Node.js, Go, and Rust bindings, as well as frameworks like LiveKit and Pipecat.
Can I use RTC and VIVA models together?
They serve different use cases. VIVA is built for human-to-AI communication — voice AI agents and bots. RTC is built for human-to-human communication — calls, meetings, and contact centers. That said, if your platform handles both scenarios, you can deploy models from each family where they're needed in your pipeline.
How many languages does Voice Translation support?
Voice Translation supports 60+ languages for real-time bidirectional translation. It handles speech-to-speech translation directly, preserving conversational flow without requiring speakers to wait for text-based translation steps. Optimized for contact center environments where agents and customers need to communicate naturally across language barriers.