Voice Translation API.
Built for accuracy.

Real-time speech-to-speech translation built for accuracy-critical applications. 61 languages, any-to-any pair.

96%
Accuracy on real calls
60+ Langs
with any-to-any pair
60 mins
Free sign up credit

Start speaking, we'll translate in real time.

SOC 2Certified
GDPRCompliant
HIPAACompliant
PCI-DSSCertified
Organizations worldwide trust us
Discord
Twilio
RingCentral
oVice
Daily
Dyte
Aircall
PhoneBurner
Gather
Interact
Roam
Vonage
Symphony
CarrierX
Zoho
Altea
Phound
Discord
Twilio
RingCentral
oVice
Daily
Dyte
Aircall
PhoneBurner
Gather
Interact
Roam
Vonage
Symphony
CarrierX
Zoho
Altea
Phound

Translation TechnologySame translation behind Krisp CX Enterprise

Built inside enterprise contact centers, where accuracy is not optional

96% Accuracy

Most voice translation APIs report accuracy on clean benchmark recordings. Krisp's 96% comes from live enterprise calls with real customers, real accents, and real background noise.
96% Accuracy

Names, Numbers, Emails…

Policy numbers, medication names, account details, dates of birth. The kind of content that typically gets hallucinated or garbled comes through accurately.
Names, Numbers, Emails…

61 Languages, Any-to-Any

Translate from any source language to any target language, including locale-specific variants like US Spanish, French Canadian, Egyptian Arabic, Catalan, Basque, and Galician.
61 Languages, Any-to-Any
Background Voice Cancellation

Background Voice Cancellation

Built-in BVC handles background noise, competing voices, and reverberation. Real-world audio from mobile phones, headsets, and call center environments works without preprocessing.
Accent Robust

Accent Robust

Indian, Hispanic, and other accented speech translates with little to no accuracy degradation.
Custom Vocabulary and Dictionary

Custom Vocabulary and Dictionary

Add your terms (medication names, product names, jargon) so the engine recognizes them, then set how each translates per language pair ("copay" → "copago" in Spanish).

The same technology
powering Krisp CX Enterprise

The core engine behind live enterprise contact center deployments, now available as an API.

96%
Accuracy on live calls, real accents, real noise
1M+
Minutes of production call translation
60+
languages, any-to-any
99.9%
Enterprise Uptime SLA

From zero to translated audio in 5 minutes

A real-time translation API you can self-serve from minute one. Sign up, get an API key, and start translating. No sales call, no procurement cycle.

# 1. Get a short-lived session key
session_key = get_vt_session_key(API_KEY)["session_key"]

# 2. Build a session config
config = VtSessionConfig(
    auth_token           = session_key,
    input_language_code  = "en-US",         # BCP-47, source
    output_language_code = "fr-FR",         # BCP-47, target
    voice                = VtVoice.FEMALE,  # output voice
    custom_vocabulary    = VtCustomVocabularyData(
        vocabulary = ["Krisp", "AcmeCorp"], # ASR boost for domain terms (optional)
        dictionary = {                      # force specific translations (optional)
            "hello": "bonjour",
            "goodbye": "au revoir",
        },
    ),
    metadata             = VtSessionMetadata(
        reference_id = "your-reference-id",  # optional correlation id for support
    ),
    background_voice_cancellation = True,
)

# 3. Open the session with callbacks
vt = Vt.create(
    config,
    original_transcript_callback   = on_source_text,       # source text
    translated_transcript_callback = on_target_text,       # translated text
    audio_result_callback          = on_translated_audio,  # translated PCM
    event_callback                 = on_event,             # flow control
    error_callback                 = on_error,             # error handling
)

# 4. Stream audio - one PCM chunk per 20ms
#    16 kHz mono s16le = 640 bytes per chunk
for chunk in pcm_chunks:
    vt.process(chunk)
    sleep(0.020)

# 5. Close when done
sleep(2.0)   # let final events land
vt.close()

Developer Experience

Self-serve access

Sign up, get 60 minutes of free translation credit, and start building. No sales call required.
Self-serve access

Configure every session
in a single JSON

Languages, voice, custom vocabulary, BVC, and transcripts, all controllable per session via a single config payload.
Configure every session<br>in a single JSON

WebSocket API

Persistent bidirectional connection at wss://streaming.krisp.ai/vt with two-step auth: API key to short-lived session key. Audio format is PCM S16LE, 16 KHz, mono (640 bytes per 20ms chunk). Python and JavaScript SDKs available with sample code. C++ coming soon.
WebSocket API
Krisp Voice Translation playground dashboard
Reference
Full API documentation
Read the docs
Free tier
60 min free, no sales call
Sign up to start building

See It In Action

Hear the engine on real calls

Both powered by 8 years of production audio and a trillion+ minutes processed.

English Portuguese
English Spanish
English Russian
English French
Try it yourself

Select your language pair, pick a voice, toggle BVC, and speak. Hear the translated output in real time, with live transcripts on both sides. 60 minutes of free translation credit on every account.

Try in Playground

Security & Privacy

Built with powerful,
enterprise-grade security in mind

The same security posture that serves enterprise contact centers, now available to every developer building with the API.

SOC 2Certified
GDPRCompliant
HIPAACompliant
PCI-DSSCertified
No voice data stored on Krisp servers
Encryption in-transit and at-rest
Visit our trust center

Pricing

Predictable pricing that scales with you

Multiple subscription tiers, from self-serve to enterprise. 60 minutes of free translation credit on every new account.

Hours included
Concurrency
Overage
Support
Background voice cancellation
High quality krisp-vt-pro-v1 model optimized for CX and high precision use cases
60+ languages, any-to-any
HIPPA, SoC etc…
Starter
$249 /mo
$5.53 / hr · billed monthly
Get API key
45 hrs/month
3
$7.00 / hr
Community
Included
Included
Included
Included
Enterprise
Custom pricing
Talk to Sales
Custom
Custom
n/a
Dedicated + SLA
Included
Included
Included
Included

Need deeper voice pipeline integration?

The Translation API is one part of the Krisp audio stack. Two more SDK families are available for teams building voice-first products.

VIVA SDK

For Voice AI Agents

Voice Isolation, Turn Prediction, Interruption Prediction, and VAD, lightweight models that sit between real-world audio and your AI agent.

Explore VIVA SDK

RTC SDK

For Human-to-Human calls

Accent Conversion, Background Voice Cancellation, and Noise Cancellation, real-time processing for contact centers and communication platforms.

Explore RTC SDK

Frequently asked questions

What was this engine built for?
The Krisp engine was built inside enterprise contact centers, the most unforgiving environment for voice AI. That means it handles noisy audio, heavy accents, domain-specific terminology, and high-stakes content (names, numbers, medication names, policy details) with production-grade accuracy. The accuracy claims come from live production calls, not benchmarks on clean audio.
Is this the same engine as the enterprise product?
Yes. Same model, same accuracy, same language support. The API provides the core translation engine. Enterprise-specific operational features like AutoQA, Live Call Monitoring, and Quick Phrases are part of the enterprise product and serve contact center workflows. The API gives you the engine directly, to build your own experience on top of.
How many languages does the voice translation API support?
61 production languages including locale-specific variants: US Spanish vs. European Spanish, French Canadian vs. metropolitan French, Egyptian Arabic, Catalan, Basque, Galician, and more. The engine was benchmarked across 30 languages in 6 business domains (finance, healthcare, insurance, retail, travel, universal) with 870 conversations evaluated.
How do Custom Vocabulary and Dictionary work?
Custom Vocabulary lets you add domain-specific terms so the engine recognizes them correctly during transcription. If you're in healthcare, you add your medication names. If you're in insurance, you add your product terms. Dictionary lets you define how specific terms should translate per language pair, e.g. "copay" → "quote-part" for French. Both are configurable per session via the API.
Does the AI voice translation API work with noisy audio?
No. The engine includes built-in Background Voice Cancellation, the same noise handling technology that powers Krisp's standalone noise cancellation products. It handles background noise, competing voices, and room reverberation. Real-world audio from mobile phones, headsets, laptops, and call center environments works without preprocessing.
What SDKs and languages are available?
Python and JavaScript SDKs with sample code and a quickstart guide. C++ SDK is coming soon. For deeper voice pipeline integration, the VIVA and RTC SDK families are available on request.
How does speech-to-speech translation work?
Speech-to-speech translation converts spoken audio in one language into spoken audio in another language in real time. Krisp's API processes the incoming audio stream, transcribes it, translates the text, and synthesizes natural-sounding speech in the target language — all in a single pipeline with sub-second latency. Unlike text translation APIs, the input and output are both audio.

The most accurate Voice Translation
API for real-world calls

If you are building accuracy-critical solution. Get your API key today.

background for toggle