The Power of On-Device Transcription in Call Centers

Dec 20, 2023

Written by Davit Baghdasaryan

Introduction
Challenges and Solutions
System Architecture
Conclusion
Try next-level audio and voice technologies

Get Started with Krisp AI Meeting Assistant:

Free Unlimited Meeting Transcriptions
AI-Powered Meeting Note Taker
Bot-free Meeting Recording Mode

Get Krisp for Free

Spread the word

Introduction

With advancements in Speech-to-text AI and on-device AI, the call center industry is approaching a transformative change. We should start rethinking the traditional approach of cloud-based transcriptions, bringing the process directly onto the agents’ devices.

Let’s dive into why this makes sense for modern call centers and BPOs.

Cost Benefits: The most immediate benefit of on-device transcription is cost savings. Traditional cloud transcription services are very expensive due to the costs associated with audio data transmission and processing on remote servers. On-device transcription, however, leverages the processing power of the agent’s device, leading to a significant reduction in these costs.
Security Enhancement: Security is a top concern in the call center world, especially when dealing with sensitive customer data. With on-device transcription, the audio is not sent to 3rd party services, drastically reducing the risk of data breaches. On-device processing aligns perfectly with stringent data protection regulations, offering peace of mind to both call centers and their customers.
On-Device PII Redaction: This method also allows for direct PII (Personally Identifiable Information) redaction on the agent’s device. This enables call centers to use the data appropriately while adhering to customer privacy and industry regulations.
Live Transcription: Another exciting aspect of on-device transcription is its application in providing low-latency live transcription for agents. This feature can be a game-changer, enabling agents to see a real-time transcript of the call. When integrated with existing agent-assist systems, agents can instantly receive suggestions and accurate responses, enhancing both efficiency and customer satisfaction.
Support for all SoftPhones: Since the transcriptions and recordings are performed on the agent’s device, it can support any SoftPhone, Dialer, or CCaaS solution the call center chooses to work with, making it application-agnostic.
Unified Experience and Storage: Many call centers (such as BPOs) must support multiple SoftPhones for their customers. With on-device transcriptions, all transcription and recording data can be saved in a unified storage giving the call center further opportunity to streamline agent experience across different communication applications and as well as easy integration with other systems.

Challenges and Solutions

Historically on-device transcription wasn’t an option for several reasons:

Hardware Limitations: The CPU requirements for on-device transcription could exceed the capabilities of existing hardware in call centers, requiring costly upgrades.

However, this challenge is being rapidly addressed by two factors:

In the last years, Speech-to-text technologies have become significantly more efficient in terms of CPU and memory requirements which makes them capable of running locally.
The influx of AI-powered laptops and workstations from Intel, AMD and Qualcomm designed to handle such tasks efficiently.

Integration Challenges: Historically, integrating on-device transcription with various call center systems and software was complex, requiring significant technical effort and resources.

However, with the integration of cloud systems in call centers, this is no longer a challenge. The transcriptions, and recordings, generated on agent’s device, can be easily uploaded to AWS, Google, Azure or other cloud storage and then fed into Call Center AI solutions such as CallMiner, Observe AI, AWS Connect, NICE, etc.

Low Transcription Quality: Historically, on-device yielded lower-quality transcriptions since you could only deploy small models on the device and you might not always have access to the latest AI models and updates that cloud-based systems do, leading to less sophisticated transcription capabilities.

However, the progress in Speech-to-text models and the availability of better CPUs makes it possible to successfully deploy and run high-quality efficient models on-device.

System Architecture

At Krisp, we utilize advanced Speech-to-text models that operate directly on-device. These models are not only highly efficient but also produce high-quality transcriptions. Importantly, they are designed to redact PII and are compatible with over 100 call center communication applications.

The following diagram shows how Krisp would typically be deployed in call centers.

Automatically install Krisp on all agents devices
Krisp will transcribe and record all agent calls from any CX solution or SoftPhone (e.g. Genesys, TalkDesk, Avaya, etc)
Krisp will redact PII on-device and upload the final transcription and recording to call center’s preferred storage (e.g. S3, Azura, FTP, etc).
The transcription and recording will be fed to Call Center AI for further processing

Conclusion

Soon on-device transcription in call centers will mark a practical shift in operational strategy, focusing on enhanced security, cost-effectiveness, and improved efficiency in call centers.