


{"id":23356,"date":"2026-06-09T16:53:44","date_gmt":"2026-06-09T12:53:44","guid":{"rendered":"https:\/\/krisp.ai\/blog\/?p=23356"},"modified":"2026-06-09T16:53:44","modified_gmt":"2026-06-09T12:53:44","slug":"introducing-voice-translation-api","status":"publish","type":"post","link":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/","title":{"rendered":"Introducing the Voice Translation API: Real-Time Speech-to-Speech Translation for Developers"},"content":{"rendered":"<p><em>The engine behind Krisp&#8217;s enterprise voice translation, with over 1M+ minutes of production call translation, tested across 30 languages, 6 business domains, and 870 real conversations, is now available as a self-serve API.<\/em><\/p>\n<h2>The demo-to-production gap in voice translation<!-- notionvc: 772c9fa8-6f2e-476e-930c-76d92a58b446 --><\/h2>\n<p>Getting a real-time voice translation demo working is easy. Getting it to survive production is the hard part.<\/p>\n<p>Real users have accents. They speak over background noise. They use domain-specific terms that carry the most weight in the conversation: medication names, policy numbers, account details, email addresses. These are exactly the terms that get hallucinated or garbled by general-purpose translation engines. And there&#8217;s no built-in feedback mechanism to tell you when it happens. Your first quality signal is a user complaint, a compliance flag, or a patient safety issue.<\/p>\n<p>Most voice translation APIs report accuracy on clean benchmark recordings made in studio conditions. Those numbers typically drop 5 to 10 points in production. The gap between what works in a demo and what works on a real call with a real customer is where most translation features fail.<\/p>\n<p>We built the <a href=\"https:\/\/krisp.ai\/developers\/voice-translation-api\/\">Voice Translation API<\/a> to close that gap. Production-grade speech-to-speech translation, the same engine running in live enterprise contact centers today, now available self-serve.<\/p>\n<p>Don&#8217;t take our word for it. <a href=\"https:\/\/lab.krisp.ai\/\"><strong>Try the playground \u2192<\/strong><\/a> Speak into it, pick a language pair, and hear the output yourself. No integration needed, no signup required.<\/p>\n<h2>The engine: 96% accuracy on real calls, not studio audio<!-- notionvc: 990ab922-304d-4367-a2c2-36b59291e630 --><\/h2>\n<p>This is not a new engine. It is the same translation engine that powers Krisp Voice Translation in live enterprise contact centers, with over 1M+ minutes of production call translation. Same model, same accuracy, same language support.<\/p>\n<p>What makes this engine different from other voice translation APIs is where it was built and how accuracy was measured. Enterprise contact centers are the most unforgiving environment for voice AI. Frustrated customers speaking fast. Background noise from open floor plans. Heavy accents across dozens of languages. Account verifications where every digit matters. Calls where a translation error means a compliance violation, a disputed claim, or a patient safety incident.<\/p>\n<p>That pressure produced an engine with production data no benchmark can replicate:<\/p>\n<p><!-- notionvc: 754ca7ae-63b7-4b72-a956-92664517f478 --><\/p>\n<table style=\"border-collapse: collapse; width: 100%; height: 168px;\">\n<tbody>\n<tr style=\"height: 24px;\">\n<td style=\"width: 50%; height: 24px;\">Metric<!-- notionvc: 3b8d1d73-6e18-4fb7-8b9d-328e216d4a1c --><\/td>\n<td style=\"width: 50%; height: 24px;\">Result<!-- notionvc: 2996d58f-ca51-4882-828d-40f3a3303221 --><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td id=\"SMwK\" class=\"\" style=\"width: 50%; height: 24px;\">Translation accuracy<\/td>\n<td style=\"width: 50%; height: 24px;\">96% on live calls with real accents and noise (AutoQA-scored)<!-- notionvc: bf9c12c1-c86e-4df5-969a-77e5d5da9bd5 --><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 50%; height: 24px;\">Calls handled without interpreter<!-- notionvc: a0d17c75-e980-4940-b3ca-63684558eda4 --><\/td>\n<td style=\"width: 50%; height: 24px;\">89% end-to-end<!-- notionvc: 86adab30-750b-43fd-afd3-083698cb0277 --><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 50%; height: 24px;\">Patient safety incidents<!-- notionvc: 7748d784-6b99-4085-96b2-65d75963fbed --><\/td>\n<td style=\"width: 50%; height: 24px;\">Zero (across 8+ languages in a healthcare deployment)<!-- notionvc: 855e6e50-5094-4622-b61e-b9773af58a52 --><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 50%; height: 24px;\">AHT reduction vs. human language services<!-- notionvc: 33dfb6ca-ed66-421d-a4ea-dd9a8ba2ccf9 --><\/td>\n<td style=\"width: 50%; height: 24px;\">20%+<!-- notionvc: 5bf7c03b-f140-499d-b2a9-053851f05e63 --><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 50%; height: 24px;\">Interpreter wait time reduction<!-- notionvc: 4d43f2e9-3542-4da3-a77d-ae297d61234d --><\/td>\n<td style=\"width: 50%; height: 24px;\">More than 2x<!-- notionvc: ff2844fb-d706-4b6b-a7fa-fc84fac33aaf --><\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"width: 50%; height: 24px;\">Production minutes translated<!-- notionvc: 7ec1aca2-3c06-4f53-9591-c3b80071e4a2 --><\/td>\n<td style=\"width: 50%; height: 24px;\">1M+<!-- notionvc: ee7f22cc-cf15-43d4-bfc7-1904b8cc3839 --><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>These numbers come from real calls with real consequences, not curated test sets. Most speech-to-speech translation APIs report accuracy on clean benchmark audio recorded in studio conditions. We measure where it counts.<\/p>\n<h2>Benchmark data: 30 languages, 6 domains, 870 conversations<!-- notionvc: 0ef2e1c1-8c80-4e52-8bd6-e7341e5e4bef --><\/h2>\n<p>Beyond production deployments, the engine has been independently evaluated using three validation layers: automated benchmarking, AI-driven semantic scoring (AutoQA), and bilingual human review by professional linguists across 8 languages.<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 50%;\">Metric<!-- notionvc: 0a63dc71-0499-4775-968f-648e49154e8d --><\/td>\n<td style=\"width: 50%;\">Result<!-- notionvc: 5d594f0c-8176-49ce-aa75-dd474d8ad4c9 --><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50%;\">English transcription accuracy (WER)<!-- notionvc: ee80fa06-f97d-4742-8794-3f7a22f0a414 --><\/td>\n<td style=\"width: 50%;\">~2.7% (97 out of 100 words correct)<!-- notionvc: 012cc512-c841-43bd-9a5c-9ddcea1233c4 --><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50%;\">Target language transcription accuracy<!-- notionvc: f8f36519-03c5-4beb-927d-569646eeb9d6 --><\/td>\n<td style=\"width: 50%;\">2\u201310% WER for most languages<!-- notionvc: f5cdf184-34f6-45af-81bc-c3aae963472c --><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50%;\">Translation quality (BLEU), top languages<!-- notionvc: 8dbcb7d7-3e5b-4d79-8741-7af72a26880a --><\/td>\n<td style=\"width: 50%;\">51\u201366 (human translations typically score ~60)<!-- notionvc: fe9179c2-ffc7-438c-98c8-c6d5ab1f08a9 --><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50%;\">Semantic accuracy (AutoQA)<!-- notionvc: 69b22a3f-e685-4ee1-a19b-ddd1368fcad3 --><\/td>\n<td style=\"width: 50%;\">94\u201396 out of 100 across all 30 benchmarked languages<!-- notionvc: 4805f363-dfab-4bce-81ee-0681dfb76dbe --><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>How we measured it<!-- notionvc: 1121ff5a-14b4-4bba-82a2-77518a0c604b --><\/h2>\n<p><strong>Transcription<\/strong> was measured using Word Error Rate (WER), the industry standard for speech recognition accuracy. Top languages like Italian (2.07%) and Spanish (2.11%) achieve WER under 2.5%.<\/p>\n<p><strong>Translation<\/strong> was measured using BLEU, scored bidirectionally (English\u2192target and target\u2192English). We also used chrF++, a character-level metric that complements BLEU for morphologically complex languages like Turkish, Finnish, and Hungarian, where word-level BLEU alone can understate quality.<\/p>\n<p><strong>AutoQA<\/strong>, Krisp&#8217;s semantic scoring system, independently validated every conversation across four dimensions: intent accuracy (35% weight), entity accuracy (30%), conversation flow (25%), and naturalness (10%). Scores averaged 94\u201396 across all 30 languages.<\/p>\n<p><strong>Bilingual human review<\/strong> by professional linguists across 8 languages independently confirmed the automated findings.<\/p>\n<p><!-- notionvc: 424e4a9d-2213-4564-b931-a7ed60e1f90b --><\/p>\n<h2>What the API does<!-- notionvc: 34a7f44b-e1d5-4d15-a97e-cfdcfe715dba --><\/h2>\n<p>Speech in one language, speech and text out in another. Real-time, synchronous, built for live conversations.<\/p>\n<p>Here&#8217;s the simplest integration. Configure a session, open it with callbacks, and stream audio:<\/p>\n<p><code># Configure and open a session<\/code><br \/>\n<code>config = VtSessionConfig(<\/code><br \/>\n<code>auth_token = session_key,<\/code><br \/>\n<code>input_language_code = \"en-US\",<\/code><br \/>\n<code>output_language_code = \"es-US\",<\/code><br \/>\n<code>voice = VtVoice.FEMALE,<\/code><br \/>\n<code>)<\/code><\/p>\n<p><code>vt = Vt.create(config,<\/code><br \/>\n<code>audio_result_callback = on_audio,<\/code><br \/>\n<code>translated_transcript_callback = on_text,<\/code><br \/>\n<code>)<\/code><\/p>\n<p><code># Stream audio in, get speech + text out<\/code><br \/>\n<code>for chunk in pcm_chunks:<\/code><br \/>\n<code>vt.process(chunk)<\/code><\/p>\n<p>That&#8217;s it. Speech in, translated speech and text out. Python and JavaScript SDKs ship with sample code and a quickstart guide. From zero to translated audio in 5 minutes.<\/p>\n<p>Here&#8217;s what the engine handles that most real-time translation APIs leave to you:<\/p>\n<p><strong>Built-in Background Voice Cancellation.<\/strong> Background noise, competing voices, reverberation. The conditions that degrade translation quality in every real-world deployment are handled before translation begins. Configurable via the API. You don&#8217;t need clean audio input.<\/p>\n<p><strong>Native accent robustness.<\/strong> Indian-accented English, Hispanic-accented English, regional accents across every supported language. Accuracy doesn&#8217;t degrade. The engine was built on the full spectrum of how people actually speak, not how they speak in recording studios.<\/p>\n<p><strong>Accurate handling of names, numbers, and emails.<\/strong> Policy numbers, medication names, account details, email addresses, dates of birth. The kind of content that typically gets hallucinated or garbled comes through accurately.<\/p>\n<p><strong>61 languages with any-to-any pairs.<\/strong> Not just &#8220;Spanish&#8221; but US Spanish, European Spanish, and the engine distinguishes between them. French Canadian and metropolitan French. Egyptian Arabic. Regional languages like Catalan, Galician, and Basque. The full list is available via the languages endpoint and updated dynamically.<\/p>\n<p><strong>Real-time transcripts.<\/strong> Interim, final, and translated transcripts streamed alongside translated audio. Each independently toggleable via the session config.<\/p>\n<p><!-- notionvc: b3fd9728-63ea-476f-b606-f2d6a88c5d56 --><\/p>\n<h2>Under the hood: technical details<!-- notionvc: 96d40228-0833-4c62-a171-e4c44d8107ff --><\/h2>\n<h3>Authentication<!-- notionvc: 51900c35-c7a4-4fed-a8da-ea3fe3edbfe9 --><\/h3>\n<p>Two-step authorization keeps your long-lived API key off the client-side connection:<!-- notionvc: 93ac01ae-398f-4f70-a23d-fdd4e4244a57 --><\/p>\n<ol>\n<li><strong>API Key<\/strong> from the developer dashboard. Used to generate short-lived session keys.<\/li>\n<li><strong>Session Key<\/strong>, a temporary scoped token with configurable TTL (5 minutes to 24 hours). Passed as a query parameter when opening the WebSocket.<\/li>\n<\/ol>\n<p><code>GET https:\/\/api.developers.krisp.ai\/v2\/sdk\/voice-translation\/session\/token?expiration_ttl=100<\/code><br \/>\n<code>Authorization: api-key API_KEY<\/code><\/p>\n<p>The long-lived key never touches the WebSocket connection directly.<\/p>\n<h3>Session configuration<!-- notionvc: fe3f0e85-4f72-4190-ab8d-665ca83eea8c --><\/h3>\n<p>Every session is controlled through a single JSON config message sent after the WebSocket connects. Source and target language, output voice, custom vocabulary, translation dictionary, transcript toggles, background voice cancellation, and client metadata are all set in one message:<!-- notionvc: ad622879-efd0-42b5-aca5-135598a9720c --><\/p>\n<p><code>{<\/code><br \/>\n<code>\"config\": {<\/code><br \/>\n<code>\"source_language\": \"en-US\",<\/code><br \/>\n<code>\"target_language\": \"es-US\",<\/code><br \/>\n<code>\"voice\": \"female\",<\/code><\/p>\n<p><code>\"vocabulary\": [\"Lisinopril\", \"metformin\", \"HIPAA\"],<\/code><br \/>\n<code>\"translation_dictionary\": [<\/code><br \/>\n<code>{ \"source\": \"copay\", \"target\": \"copago\" },<\/code><br \/>\n<code>{ \"source\": \"referral\", \"target\": \"remisi\u00f3n\" }<\/code><br \/>\n<code>],<\/code><\/p>\n<p><code>\"transcript\": {<\/code><br \/>\n<code>\"interim\": true,<\/code><br \/>\n<code>\"final\": true,<\/code><br \/>\n<code>\"translate\": true<\/code><br \/>\n<code>},<\/code><\/p>\n<p><code>\"features\": {<\/code><br \/>\n<code>\"background_voice_cancellation\": true<\/code><br \/>\n<code>}<\/code><br \/>\n<code>}<\/code><br \/>\n<code>}<\/code><\/p>\n<h3>Domain customization from day one<!-- notionvc: a994cc7e-8e2f-464b-88bc-efabc01f4b9c --><\/h3>\n<p><strong>Custom Vocabulary<\/strong> improves transcription accuracy. Add terms the engine should recognize: product names, medical terminology, internal codes. If you&#8217;re in healthcare, you add your medication names. If you&#8217;re in insurance, you add your product terms.<\/p>\n<p><strong>Translation Dictionary<\/strong> controls how recognized terms are translated. Define specific source \u2192 target mappings per language pair. Map &#8220;copay&#8221; to &#8220;copago&#8221; in Spanish. Map &#8220;deductible&#8221; to &#8220;Selbstbehalt&#8221; in German. You control both recognition and translation output.<\/p>\n<p>Both are configured per session via the JSON config. No training step, no fine-tuning, no waiting. Add your terms and they&#8217;re active immediately.<\/p>\n<p><!-- notionvc: e3da58a3-0283-48fb-b499-60e2e9d66bc2 --><\/p>\n<h3>Server events<\/h3>\n<p>Three event types come back alongside translated audio frames:<\/p>\n<ul>\n<li><strong>Transcript<\/strong>: real-time source transcription (interim and final), with utterance ID, timestamp, and duration<\/li>\n<li><strong>Translation<\/strong>: translated text linked to each transcript via utterance ID<\/li>\n<li><strong>Error<\/strong>: HTTP-style codes (400, 401, 402, 429, 500) with reason and description<\/li>\n<\/ul>\n<h3>Audio format<\/h3>\n<p>PCM S16LE, 16 KHz, mono (640 bytes per 20ms chunk). Translated audio returns in the same encoding. Additional formats coming soon.<\/p>\n<h2>Security and compliance<\/h2>\n<p>The API carries the same security posture that serves enterprise contact centers. No voice data is stored on Krisp servers. Encryption in-transit and at-rest.<\/p>\n<p><strong>Certifications:<\/strong> SOC 2 Type II \u00b7 HIPAA \u00b7 GDPR \u00b7 PCI-DSS 4.0<\/p>\n<p>For full details, visit the <a href=\"https:\/\/krisp.ai\/trust-center\/\">Krisp Trust Center<\/a>.<\/p>\n<h2>Where accuracy-critical voice translation fits<\/h2>\n<p>Not every voice translation use case demands the same level of accuracy. The Krisp engine was built for environments where translation errors have real consequences, and that&#8217;s where its production provenance matters most.<\/p>\n<p><strong>Accuracy is critical.<\/strong> Healthcare, legal, emergency services, pharmaceutical. A mistranslated medication name is a patient safety incident. A garbled legal term changes the outcome of a proceeding. A misunderstood 911 call costs time that someone doesn&#8217;t have.<\/p>\n<p><strong>Accuracy has financial or compliance consequences.<\/strong> Insurance, financial services, government services, enterprise procurement. Mandated disclosures, transaction details, and policy terms must land correctly in the customer&#8217;s language.<\/p>\n<p><strong>Accuracy drives business outcomes.<\/strong> Customer support for complex products, cross-language sales, B2B meetings, HR and recruiting. Accumulated translation quality directly impacts CSAT, close rates, resolution rates, and trust.<\/p>\n<p>For gaming, social apps, streaming, and travel, the engine works well. But the buying criteria are different: latency, naturalness, language coverage, and DX matter more than accuracy provenance.<\/p>\n<h2>Pricing: self-serve to enterprise<\/h2>\n<p><strong>Self-Serve: Get Started.<\/strong> 60 minutes of free translation credit included. Full engine access (same model as enterprise), 61 languages with locale variants, Custom Vocabulary and Dictionary, Python and JavaScript SDKs, developer dashboard and playground. No sales call required.<\/p>\n<p><strong>Subscription: Production.<\/strong> Everything in self-serve, plus included translation hours with predictable monthly cost that scales with your usage. Usage monitoring and billing dashboard.<\/p>\n<p><strong>Enterprise: Custom.<\/strong> Volume pricing, dedicated support with 99.9% uptime SLA, VIVA and RTC SDK access, custom integration support. <a href=\"https:\/\/krisp.ai\/contact-sales\/\">Talk to Sales \u2192<\/a><\/p>\n<h2>Need deeper voice pipeline integration?<\/h2>\n<p>The Translation API is one part of the Krisp audio stack. Two more SDK families are available for teams building voice-first products.<\/p>\n<p><strong>VIVA SDK for Voice AI Agents.<\/strong> Voice Isolation, Turn Prediction, Interruption Prediction, and VAD. Lightweight models that sit between real-world audio and your AI agent. <a href=\"https:\/\/krisp.ai\/developers\/#viva-sdk\">Explore VIVA SDK \u2192<\/a><\/p>\n<p><strong>RTC SDK for Human-to-Human Calls.<\/strong> Accent Conversion, Background Voice Cancellation, and Noise Cancellation. Real-time audio processing for contact centers and communication platforms. <a href=\"https:\/\/krisp.ai\/developers\/#rtc-sdk\">Explore RTC SDK \u2192<\/a><\/p>\n<h2>What&#8217;s coming next<\/h2>\n<p><strong>Auto language detection.<\/strong> Automatic source language identification so developers don&#8217;t need to specify it per session.<\/p>\n<p><strong>Voice cloning.<\/strong> Preserve the speaker&#8217;s original voice in the translated output.<\/p>\n<p><strong>Additional audio formats<\/strong> beyond the current PCM S16LE 16 KHz mono.<\/p>\n<h2>Start building<\/h2>\n<p>This engine was built inside enterprise contact centers, on calls where a wrong word means a patient safety incident, a disputed insurance claim, or a compliance violation. 96% accuracy measured on live calls, not studio audio. 1M+ minutes of production translation. 30 languages benchmarked across 6 domains with AutoQA scores of 94\u201396. BLEU scores that match professional human translators.<\/p>\n<p>The access model changed. The engine didn&#8217;t.<\/p>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div>\n<div class=\"btn_set\">\n<div class=\"button btn--dark\">\n        <a class=\"btn_set_link\" href=\"\/developers.krisp.ai\">Get API Key Free<\/a>\n    <\/div>\n<div class=\"button btn--outline outline--dark\">\n        <a class=\"btn_set_link\" href=\"\/lab.krisp.ai\/\">Try in Playground<\/a>\n    <\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div>\n<div data-test-render-count=\"1\">\n<div class=\"group\">\n<div class=\"group relative relative pb-3\" data-is-streaming=\"false\">\n<div class=\"font-claude-response relative leading-[1.65rem] [&amp;_pre&gt;div]:bg-bg-000\/50 [&amp;_pre&gt;div]:border-0.5 [&amp;_pre&gt;div]:border-border-400 [&amp;_.ignore-pre-bg&gt;div]:bg-transparent [&amp;_.standard-markdown_:is(p,blockquote,h1,h2,h3,h4,h5,h6)]:pl-2 [&amp;_.standard-markdown_:is(p,blockquote,ul,ol,h1,h2,h3,h4,h5,h6)]:pr-8 [&amp;_.progressive-markdown_:is(p,blockquote,h1,h2,h3,h4,h5,h6)]:pl-2 [&amp;_.progressive-markdown_:is(p,blockquote,ul,ol,h1,h2,h3,h4,h5,h6)]:pr-8\">\n<div>\n<div class=\"grid grid-rows-[auto_auto] min-w-0\">\n<div class=\"row-start-2 col-start-1 relative grid isolate min-w-0\">\n<div class=\"row-start-1 col-start-1 relative z-[2] min-w-0\">\n<div>\n<div class=\"standard-markdown grid-cols-1 grid [&amp;_&gt;_*]:min-w-0 gap-3 standard-markdown\">\n<h2 class=\"font-claude-response-body break-words whitespace-normal leading-[1.7]\">FAQ<\/h2>\n<p class=\"font-claude-response-body break-words whitespace-pre-wrap leading-[1.7]\">\n<div class=\"faq_item\">\n<div class=\"faq_title text_body--md text--semi-bold\"><strong>How accurate is Krisp's voice translation API?<\/strong><\/div>\n<div class=\"faq_answer text_body--md\">It delivers 96% accuracy measured on real enterprise calls, not studio audio. That figure comes from over 1M+ minutes of production call translation in live contact centers, where heavy accents, background noise, and high-stakes details like policy numbers and medication names test accuracy under real conditions. Most voice translation APIs report accuracy on clean benchmark recordings, and those numbers typically drop 5 to 10 points in production.<\/div>\n<\/div>\n<div class=\"faq_item\">\n<div class=\"faq_title text_body--md text--semi-bold\"><strong>How is voice translation accuracy measured?<\/strong><\/div>\n<div class=\"faq_answer text_body--md\">Krisp uses three independent validation layers. Transcription is scored with Word Error Rate (WER) \u2014 top languages like Italian (2.07%) and Spanish (2.11%) achieve WER under 2.5%. Translation is scored with BLEU bidirectionally (English\u2192target and target\u2192English), plus chrF++ for morphologically complex languages like Turkish, Finnish, and Hungarian. Krisp&#8217;s AutoQA system then rates every conversation across intent accuracy (35%), entity accuracy (30%), conversation flow (25%), and naturalness (10%), averaging 94\u201396 across all 30 benchmarked languages, with bilingual professional linguists across 8 languages confirming the results.<\/div>\n<\/div>\n<div class=\"faq_item\">\n<div class=\"faq_title text_body--md text--semi-bold\"><strong>Why do voice translation APIs lose accuracy in production?<\/strong><\/div>\n<div class=\"faq_answer text_body--md\">Because real conditions look nothing like a demo. Real users have accents, speak over background noise, and use domain-specific terms \u2014 medication names, policy numbers, account details \u2014 that general-purpose engines tend to hallucinate or garble. Most APIs report accuracy on clean studio recordings, so their numbers typically fall 5 to 10 points once they hit real calls. Krisp built and measured its engine inside enterprise contact centers, the most unforgiving environment for voice AI, so its reported accuracy reflects the conditions you&#8217;ll actually deploy in.<\/div>\n<\/div>\n<div class=\"faq_item\">\n<div class=\"faq_title text_body--md text--semi-bold\"><strong>How is this different from Krisp's enterprise voice translation?<\/strong><\/div>\n<div class=\"faq_answer text_body--md\">It&#8217;s the same engine, not a new one \u2014 the same model, accuracy, and language support that powers Krisp Voice Translation in live enterprise contact centers today, with over 1M+ minutes of production call translation behind it. The only thing that changed is the access model: it&#8217;s now available self-serve via API, with Python and JavaScript SDKs, a developer dashboard and playground, and 60 minutes of free translation credit to start.<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><!-- notionvc: 1222e9c2-90d4-4826-81ab-2a9fc7f1cdf5 --><\/p>\n<p><!-- notionvc: 8d3a6fac-fa92-4158-b40f-dfe4526527f8 --><\/p>\n<p><!-- notionvc: 649bd956-2af1-45ac-8b5c-7d3acd4c5df6 --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The engine behind Krisp&#8217;s enterprise voice translation, with over 1M+ minutes of production call translation, tested across 30 languages, 6 business domains, and 870 real conversations, is now available as a self-serve API. The demo-to-production gap in voice translation Getting a real-time voice translation demo working is easy. Getting it to survive production is the [&hellip;]<\/p>\n","protected":false},"author":71,"featured_media":23371,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"two_page_speed":[]},"categories":[421,1,588],"tags":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.2 (Yoast SEO v23.6) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Krisp Launches a Voice Translation API for Developers<\/title>\n<meta name=\"description\" content=\"Krisp&#039;s enterprise voice translation engine \u2014 1M+ minutes of production calls, 96% accuracy on real audio, 61 languages. Try it free!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Krisp Launches a Voice Translation API for Developers\" \/>\n<meta property=\"og:description\" content=\"Krisp&#039;s enterprise voice translation engine \u2014 1M+ minutes of production calls, 96% accuracy on real audio, 61 languages. Try it free!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/\" \/>\n<meta property=\"og:site_name\" content=\"Krisp\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/krispHQ\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-06-09T12:53:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1744\" \/>\n\t<meta property=\"og:image:height\" content=\"800\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Krisp Engineering Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@krispHQ\" \/>\n<meta name=\"twitter:site\" content=\"@krispHQ\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/\"},\"author\":{\"name\":\"Krisp Engineering Team\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/e9f59158d89de3002958d323d2e788f5\"},\"headline\":\"Introducing the Voice Translation API: Real-Time Speech-to-Speech Translation for Developers\",\"datePublished\":\"2026-06-09T12:53:44+00:00\",\"dateModified\":\"2026-06-09T12:53:44+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/\"},\"wordCount\":1934,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp\",\"articleSection\":[\"Engineering Blog\",\"Krisp News\",\"Voice Translation\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/\",\"url\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/\",\"name\":\"Krisp Launches a Voice Translation API for Developers\",\"isPartOf\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp\",\"datePublished\":\"2026-06-09T12:53:44+00:00\",\"dateModified\":\"2026-06-09T12:53:44+00:00\",\"description\":\"Krisp's enterprise voice translation engine \u2014 1M+ minutes of production calls, 96% accuracy on real audio, 61 languages. Try it free!\",\"breadcrumb\":{\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage\",\"url\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp\",\"contentUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp\",\"width\":1744,\"height\":800,\"caption\":\"Krisp Voice translation API for developers\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/krisp.ai\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Introducing the Voice Translation API: Real-Time Speech-to-Speech Translation for Developers\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/krisp.ai\/blog\/#website\",\"url\":\"https:\/\/krisp.ai\/blog\/\",\"name\":\"Krisp\",\"description\":\"Blog\",\"publisher\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/krisp.ai\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\",\"name\":\"Krisp\",\"url\":\"https:\/\/krisp.ai\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png\",\"contentUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png\",\"width\":696,\"height\":696,\"caption\":\"Krisp\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/krispHQ\/\",\"https:\/\/x.com\/krispHQ\",\"https:\/\/www.linkedin.com\/company\/krisphq\/\",\"https:\/\/www.youtube.com\/channel\/UCAMZinJdR9P33fZUNpuxXtg\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/e9f59158d89de3002958d323d2e788f5\",\"name\":\"Krisp Engineering Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/26475ad8219056696662f819691ee49d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/26475ad8219056696662f819691ee49d?s=96&d=mm&r=g\",\"caption\":\"Krisp Engineering Team\"},\"url\":\"https:\/\/krisp.ai\/blog\/author\/eng-team\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Krisp Launches a Voice Translation API for Developers","description":"Krisp's enterprise voice translation engine \u2014 1M+ minutes of production calls, 96% accuracy on real audio, 61 languages. Try it free!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/","og_locale":"en_US","og_type":"article","og_title":"Krisp Launches a Voice Translation API for Developers","og_description":"Krisp's enterprise voice translation engine \u2014 1M+ minutes of production calls, 96% accuracy on real audio, 61 languages. Try it free!","og_url":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/","og_site_name":"Krisp","article_publisher":"https:\/\/www.facebook.com\/krispHQ\/","article_published_time":"2026-06-09T12:53:44+00:00","og_image":[{"width":1744,"height":800,"url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp","type":"image\/webp"}],"author":"Krisp Engineering Team","twitter_card":"summary_large_image","twitter_creator":"@krispHQ","twitter_site":"@krispHQ","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#article","isPartOf":{"@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/"},"author":{"name":"Krisp Engineering Team","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/e9f59158d89de3002958d323d2e788f5"},"headline":"Introducing the Voice Translation API: Real-Time Speech-to-Speech Translation for Developers","datePublished":"2026-06-09T12:53:44+00:00","dateModified":"2026-06-09T12:53:44+00:00","mainEntityOfPage":{"@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/"},"wordCount":1934,"commentCount":0,"publisher":{"@id":"https:\/\/krisp.ai\/blog\/#organization"},"image":{"@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage"},"thumbnailUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp","articleSection":["Engineering Blog","Krisp News","Voice Translation"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/","url":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/","name":"Krisp Launches a Voice Translation API for Developers","isPartOf":{"@id":"https:\/\/krisp.ai\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage"},"image":{"@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage"},"thumbnailUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp","datePublished":"2026-06-09T12:53:44+00:00","dateModified":"2026-06-09T12:53:44+00:00","description":"Krisp's enterprise voice translation engine \u2014 1M+ minutes of production calls, 96% accuracy on real audio, 61 languages. Try it free!","breadcrumb":{"@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#primaryimage","url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp","contentUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2026\/06\/ezgif-24551c6dce50ec3c.webp","width":1744,"height":800,"caption":"Krisp Voice translation API for developers"},{"@type":"BreadcrumbList","@id":"https:\/\/krisp.ai\/blog\/introducing-voice-translation-api\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/krisp.ai\/blog\/"},{"@type":"ListItem","position":2,"name":"Introducing the Voice Translation API: Real-Time Speech-to-Speech Translation for Developers"}]},{"@type":"WebSite","@id":"https:\/\/krisp.ai\/blog\/#website","url":"https:\/\/krisp.ai\/blog\/","name":"Krisp","description":"Blog","publisher":{"@id":"https:\/\/krisp.ai\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/krisp.ai\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/krisp.ai\/blog\/#organization","name":"Krisp","url":"https:\/\/krisp.ai\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png","contentUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png","width":696,"height":696,"caption":"Krisp"},"image":{"@id":"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/krispHQ\/","https:\/\/x.com\/krispHQ","https:\/\/www.linkedin.com\/company\/krisphq\/","https:\/\/www.youtube.com\/channel\/UCAMZinJdR9P33fZUNpuxXtg"]},{"@type":"Person","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/e9f59158d89de3002958d323d2e788f5","name":"Krisp Engineering Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/26475ad8219056696662f819691ee49d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/26475ad8219056696662f819691ee49d?s=96&d=mm&r=g","caption":"Krisp Engineering Team"},"url":"https:\/\/krisp.ai\/blog\/author\/eng-team\/"}]}},"primary_category":"Engineering Blog","_links":{"self":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/23356"}],"collection":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/users\/71"}],"replies":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/comments?post=23356"}],"version-history":[{"count":24,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/23356\/revisions"}],"predecessor-version":[{"id":23393,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/23356\/revisions\/23393"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/media\/23371"}],"wp:attachment":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/media?parent=23356"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/categories?post=23356"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/tags?post=23356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}