AI Meeting Assistant
Back

AI Meeting Assistant

with #1 Noise Cancellation

Explore AI Meeting Assistant

AI Notetaker

AI Note Taker

Meeting Transcription

Meeting Recording

Meeting Summary

Real Time Voice AI

Noise Cancellation

Accent Conversion - Speaker side

Accent Conversion - Listener side
Call Center AI
Back

Call Center AI

AI that boosts call center productivity

Explore platform

Speech Assist

Noise Cancellation

Remove background noises, voices & echoes.

Accent Conversion

Real-time accent conversion for call center agents.

Voice Translation

Real-time AI voice translation for call center agents.

Agent & Supervisor Assist

Agent Assist

Real-time AI assistant for call center agents.

Speech Analytics

Call scoring, Compliance monitoring and more.

Voice security

Real-time fraud detection and more
Developers
Back

Developers

with #1 AI Voice Models

Explore developers

For Voice AI Agents

Voice Isolation

Isolate the primary speaker’s voice

Turn-Taking

Improving turn-taking for AI

For Human-to-human Calls

Accent Conversion

Convert accents in calls

Noise Cancellation

Noise removal in calls

Voice Translation API New

Real-time translation, self-serve
Customers
Pricing

Book a demo

Get Krisp for free

AI Meeting Assistant
Back

AI Meeting Assistant

with #1 Noise Cancellation

Explore AI Meeting Assistant

AI Notetaker

AI Note Taker

Meeting Transcription

Meeting Recording

Meeting Summary

Real Time Voice AI

Noise Cancellation

Accent Conversion - Speaker side

Accent Conversion - Listener side
Call Center AI
Back

Call Center AI

AI that boosts call center productivity

Explore platform

Speech Assist

Noise Cancellation

Remove background noises, voices & echoes.

Accent Conversion

Real-time accent conversion for call center agents.

Voice Translation

Real-time AI voice translation for call center agents.

Agent & Supervisor Assist

Agent Assist

Real-time AI assistant for call center agents.

Speech Analytics

Call scoring, Compliance monitoring and more.

Voice security

Real-time fraud detection and more
Developers
Back

Developers

with #1 AI Voice Models

Explore developers

For Voice AI Agents

Voice Isolation

Isolate the primary speaker’s voice

Turn-Taking

Improving turn-taking for AI

For Human-to-human Calls

Accent Conversion

Convert accents in calls

Noise Cancellation

Noise removal in calls

Voice Translation API New

Real-time translation, self-serve
Customers
Pricing

Book a demo

New Inside the Voice Translation API

Voice Translation API.
Built for accuracy.

Real-time speech-to-speech translation built for accuracy-critical applications. 61 languages, any-to-any pair.

Get API Key - Free

Try in Playground

96%

Accuracy on real calls

60+ Langs

with any-to-any pair

60 mins

Free sign up credit

Start speaking, we'll translate in real time.

Source language

Translate to

SOC 2Certified

GDPRCompliant

HIPAACompliant

PCI-DSSCertified

Organizations worldwide trust us

Translation TechnologySame translation behind Krisp CX Enterprise

Built inside enterprise contact centers, where accuracy is not optional

96% Accuracy

Most voice translation APIs report accuracy on clean benchmark recordings. Krisp's 96% comes from live enterprise calls with real customers, real accents, and real background noise.

Names, Numbers, Emails…

Policy numbers, medication names, account details, dates of birth. The kind of content that typically gets hallucinated or garbled comes through accurately.

61 Languages, Any-to-Any

Translate from any source language to any target language, including locale-specific variants like US Spanish, French Canadian, Egyptian Arabic, Catalan, Basque, and Galician.

Background Voice Cancellation

Built-in BVC handles background noise, competing voices, and reverberation. Real-world audio from mobile phones, headsets, and call center environments works without preprocessing.

Accent Robust

Indian, Hispanic, and other accented speech translates with little to no accuracy degradation.

Custom Vocabulary and Dictionary

Add your terms (medication names, product names, jargon) so the engine recognizes them, then set how each translates per language pair ("copay" → "copago" in Spanish).

The same technology
powering Krisp CX Enterprise

The core engine behind live enterprise contact center deployments, now available as an API.

96%

Accuracy on live calls, real accents, real noise

1M+

Minutes of production call translation

60+

languages, any-to-any

99.9%

Enterprise Uptime SLA

Hear the engine for yourself

Try in Playground Watch demo

From zero to translated audio in 5 minutes

A real-time translation API you can self-serve from minute one. Sign up, get an API key, and start translating. No sales call, no procurement cycle.

# 1. Get a short-lived session key
session_key = get_vt_session_key(API_KEY)["session_key"]

# 2. Build a session config
config = VtSessionConfig(
    auth_token           = session_key,
    input_language_code  = "en-US",         # BCP-47, source
    output_language_code = "fr-FR",         # BCP-47, target
    voice                = VtVoice.FEMALE,  # output voice
    custom_vocabulary    = VtCustomVocabularyData(
        vocabulary = ["Krisp", "AcmeCorp"], # ASR boost for domain terms (optional)
        dictionary = {                      # force specific translations (optional)
            "hello": "bonjour",
            "goodbye": "au revoir",
        },
    ),
    metadata             = VtSessionMetadata(
        reference_id = "your-reference-id",  # optional correlation id for support
    ),
    background_voice_cancellation = True,
)

# 3. Open the session with callbacks
vt = Vt.create(
    config,
    original_transcript_callback   = on_source_text,       # source text
    translated_transcript_callback = on_target_text,       # translated text
    audio_result_callback          = on_translated_audio,  # translated PCM
    event_callback                 = on_event,             # flow control
    error_callback                 = on_error,             # error handling
)

# 4. Stream audio - one PCM chunk per 20ms
#    16 kHz mono s16le = 640 bytes per chunk
for chunk in pcm_chunks:
    vt.process(chunk)
    sleep(0.020)

# 5. Close when done
sleep(2.0)   # let final events land
vt.close()

# Source transcript - interim partials and final utterances
def on_source_text(r):
    r.transcript  # str - interim partial OR final utterance
    r.type        # INTERIM | FINAL
    r.chunk_id    # groups interim updates for one utterance
    r.duration    # ms covered by this transcript
    r.timestamp   # server-side start time

# Translated transcript - same shape as source
def on_target_text(r):
    r.transcript  # translated text
    r.type        # INTERIM | FINAL

# Translated audio - raw PCM bytes
def on_translated_audio(r):
    r.output_samples  # bytes - int16 PCM, 16 kHz mono

# Flow control events
def on_event(e):          # INPUT_ALLOWED | INPUT_NOT_ALLOWED
    ...

# Error handling with recovery hints
def on_error(e):
    if vt_is_retryable(e):
        ...              # exponential backoff, reconnect
    log(vt_recovery_hint(e))

// Single JSON config - sent once when the WebSocket session opens
// wss://streaming.krisp.ai/vt?authorization=Api-Key SESSION_KEY

{
  "config": {
    "source_language": "en-US",
    "target_language": "es-US",
    "voice": "female",

    "vocabulary": ["Lisinopril", "metformin", "HIPAA"],
    "translation_dictionary": [
      { "source": "copay", "target": "copago" },
      { "source": "referral", "target": "remisión" }
    ],

    "transcript": {
      "interim": true,
      "final": true,
      "translate": true
    },

    "features": {
      "background_voice_cancellation": true
    }
  }
}

const { Vt, VtSessionConfig, VtCustomVocabularyData,
        VtVoice, mintVtSessionKey } = require('krisp-audio-node-sdk').vt;

// 1. Get a short-lived session key
const { session_key } = await mintVtSessionKey(API_KEY);

// 2. Build a session config
const config = new VtSessionConfig({
  authToken:          session_key,
  inputLanguageCode:  'en-US',          // BCP-47, source
  outputLanguageCode: 'fr-FR',          // BCP-47, target
  voice:              VtVoice.FEMALE,  // output voice
  customVocabulary: new VtCustomVocabularyData({
    vocabulary: ['Krisp', 'AcmeCorp'],  // ASR boost for domain terms (optional)
    dictionary: {                       // force specific translations (optional)
      'hello':   'bonjour',
      'goodbye': 'au revoir',
    },
  }),
});

// 3. Open the session with callbacks
const vt = await Vt.create(
  config,
  onSourceText,       // original transcript
  onTargetText,       // translated transcript
  onTranslatedAudio,  // translated PCM
  onEvent,            // flow control
  onError,            // error handling
);

// 4. Stream audio - one PCM chunk per 20 ms
//    16 kHz mono s16le = 640 bytes per chunk
for (const chunk of pcmChunks) {
  vt.process(chunk);
  await sleep(20);
}

// 5. Close when done
await sleep(2000);  // let final events land
await vt.close();

// Source transcript - interim partials and final utterances
function onSourceText(r) {
  r.transcript  // string - interim partial OR final utterance
  r.type        // VtTranscriptionType.INTERIM | FINAL
  r.chunkId     // groups interim updates for one utterance
  r.duration    // ms covered by this transcript
  r.timestamp   // server-side start time
}

// Translated transcript - same shape as source
function onTargetText(r) {
  r.transcript  // translated text
  r.type        // VtTranscriptionType.INTERIM | FINAL
}

// Translated audio - raw PCM Buffer
function onTranslatedAudio(r) {
  r.outputSamples  // Buffer - int16 PCM, 16 kHz mono
}

// Flow control events
function onEvent(e) { }  // VtEventType.INPUT_ALLOWED | INPUT_NOT_ALLOWED

// Error handling with recovery hints
function onError(e) {
  if (vtIsRetryable(e)) {
    // exponential backoff, reconnect
  }
  console.log(vtRecoveryHint(e));
}

Developer Experience

Self-serve access

Configure every session
in a single JSON

Languages, voice, custom vocabulary, BVC, and transcripts, all controllable per session via a single config payload.

Configure every session<br>in a single JSON

WebSocket API

Persistent bidirectional connection at wss://streaming.krisp.ai/vt with two-step auth: API key to short-lived session key. Audio format is PCM S16LE, 16 KHz, mono (640 bytes per 20ms chunk). Python and JavaScript SDKs available with sample code. C++ coming soon.

Reference

Full API documentation

Read the docs

Free tier

60 min free, no sales call

See It In Action

Hear the engine on real calls

Both powered by 8 years of production audio and a trillion+ minutes processed.

English Portuguese

English Spanish

English Russian

English French

Security & Privacy

Built with powerful,
enterprise-grade security in mind

The same security posture that serves enterprise contact centers, now available to every developer building with the API.

SOC 2Certified

GDPRCompliant

HIPAACompliant

PCI-DSSCertified

No voice data stored on Krisp servers

Encryption in-transit and at-rest

Visit our trust center

Pricing

Predictable pricing that scales with you

Multiple subscription tiers, from self-serve to enterprise. 60 minutes of free translation credit on every new account.

Hours included

Concurrency

Overage

Support

Background voice cancellation

High quality krisp-vt-pro-v1 model optimized for CX and high precision use cases

60+ languages, any-to-any

HIPPA, SoC etc…

Starter

$249 /mo

$5.53 / hr · billed monthly

Get API key

45 hrs/month

$7.00 / hr

Community

Advanced

$799 /mo

$5.53 / hr · billed monthly

Get API key

150 hrs/month

$6.50 / hr

Community + Email

Enterprise

Custom pricing

Talk to Sales

Custom

n/a

Dedicated + SLA

Need deeper voice pipeline integration?

The Translation API is one part of the Krisp audio stack. Two more SDK families are available for teams building voice-first products.

VIVA SDK

For Voice AI Agents

Voice Isolation, Turn Prediction, Interruption Prediction, and VAD, lightweight models that sit between real-world audio and your AI agent.

Explore VIVA SDK

RTC SDK

For Human-to-Human calls

Accent Conversion, Background Voice Cancellation, and Noise Cancellation, real-time processing for contact centers and communication platforms.

Explore RTC SDK

Frequently asked questions

What was this engine built for?

The Krisp engine was built inside enterprise contact centers, the most unforgiving environment for voice AI. That means it handles noisy audio, heavy accents, domain-specific terminology, and high-stakes content (names, numbers, medication names, policy details) with production-grade accuracy. The accuracy claims come from live production calls, not benchmarks on clean audio.

Is this the same engine as the enterprise product?

Yes. Same model, same accuracy, same language support. The API provides the core translation engine. Enterprise-specific operational features like AutoQA, Live Call Monitoring, and Quick Phrases are part of the enterprise product and serve contact center workflows. The API gives you the engine directly, to build your own experience on top of.

How many languages does the voice translation API support?

61 production languages including locale-specific variants: US Spanish vs. European Spanish, French Canadian vs. metropolitan French, Egyptian Arabic, Catalan, Basque, Galician, and more. The engine was benchmarked across 30 languages in 6 business domains (finance, healthcare, insurance, retail, travel, universal) with 870 conversations evaluated.

How do Custom Vocabulary and Dictionary work?

Custom Vocabulary lets you add domain-specific terms so the engine recognizes them correctly during transcription. If you're in healthcare, you add your medication names. If you're in insurance, you add your product terms. Dictionary lets you define how specific terms should translate per language pair, e.g. "copay" → "quote-part" for French. Both are configurable per session via the API.

Does the AI voice translation API work with noisy audio?

No. The engine includes built-in Background Voice Cancellation, the same noise handling technology that powers Krisp's standalone noise cancellation products. It handles background noise, competing voices, and room reverberation. Real-world audio from mobile phones, headsets, laptops, and call center environments works without preprocessing.

What SDKs and languages are available?

Python and JavaScript SDKs with sample code and a quickstart guide. C++ SDK is coming soon. For deeper voice pipeline integration, the VIVA and RTC SDK families are available on request.

How does speech-to-speech translation work?

Speech-to-speech translation converts spoken audio in one language into spoken audio in another language in real time. Krisp's API processes the incoming audio stream, transcribes it, translates the text, and synthesizes natural-sounding speech in the target language — all in a single pipeline with sub-second latency. Unlike text translation APIs, the input and output are both audio.

The most accurate Voice Translation
API for real-world calls

If you are building accuracy-critical solution. Get your API key today.