Real-time communications have become increasingly popular with the rise of remote work and the need for virtual meetings. WebRTC is a widely-used technology that enables real-time communication through web browsers. It allows for video, audio, and data transmission without the need for plugins or downloads. WebRTC libraries are also available outside of the browsers and native libraries are supported for major native platforms. WebRTC handles the sound stream from the microphone and delivers it to the peer by taking care of sound stream encoding decoding and data encryption. The user does not have to know anything about encryption and audio codecs to use WebRTC. This is the reason why WebRTC is so popular for developing real-time communication systems. With the availability of WebRTC writing communication applications became easy.
WebRTC has integrated open source, minimum-level noise-canceling technology which can be triggered by the user. However, the technology is not nearly as advanced as the Krisp noise cancellation and voice processing technologies. Applications requiring robust noise cancellation with natural voice quality opt for Krisp, while those applications not requiring great voice quality opt for an open source option.
Modifying the Audio Stream in WebRTC
WebRTC does not provide a consistent way to modify the sound stream in the application across different platforms, which is a prerequisite for getting Krisp SDK to function. There is a way to modify the sound stream in WebRTC for Android, however, the technique is inconsistent across other platforms and didn’t represent a high standard of coding practice. Also, the technique assumed the creation of AudioDeviceModule in the C++ layer and return its pointer in the form of a long variable and pass it to the Java-level module. Clearly, this is not an elegant coding standard we want our customers to follow.
Krisp has introduced the Audio Hook feature into WebRTC, which represents a consistent approach to modifying the audio in the client application on all platforms.
The Audio Hook allows modification of the sound stream of the microphone in the client app before the sound is sent to the other peer over the network. The Krisp Application Engineering team has developed and outlined a step-by-step approach for integrating Krisp into your WebRTC application.
Stream Flow Diagram
The Version and Relation to Chrome
Google WebRTC project is related to Google Chrome. It is has been a few years since these projects use the same branch names to match each other on each release. Currently, on March 6, 2023, the stable version of Chrome is 110. Chrome 111 is in beta state. We have chosen version 111 to apply modifications on top of it. You can use the following page to find the relation between the Chrome version and the branch version.
Implementing Krisp Audio Hook on iOS
Implementing the Krisp Audio Hook on iOS involves several steps. First, you need to create a new class that implements the RTCAudioProcessorDelegate abstraction and is responsible for implementing the audio modification logic. This class should include the frameProcess method, where you can write code to process the audio frames with Krisp’s noise-canceling technology.
Here is an example implementation of the RTCAudioProcessorDelegate abstraction in the iOS sample app:
After implementing the KrispAudioProcessor class, you need to inject it into the system using the setup static function in the RTCPeerConnectionFactory class.
After building WebRTC for iOS with the Audio Hook modifications, you can now implement the RTCAudioProcessorDelegate in your iOS application. The RTCAudioProcessorDelegate is an abstract class provided by WebRTC that allows you to modify the audio frames before they are sent it over the network.
To implement the Audio Hook on iOS, you need to write audio frame processing code. You can create a new class that implements the RTCAudioProcessorDelegate abstraction, which will be responsible for implementing the audio modification logic. The abstract class has a frameProcess abstract method that is called for each audio frame that needs to be processed.
Here’s an example of how you can implement the RTCAudioProcessorDelegate in an iOS application:
You can customize the audio processing logic inside the
frameProcess method according to your needs. For example, you can apply noise reduction, echo cancellation, or any other audio processing technique offered by Krisp SDK.
Once you have implemented the RTCAudioProcessorDelegate, you need to inject it into the WebRTC system. The modified WebRTC for iOS introduces a new static function
setup in the
RTCPeerConnectionFactory class, which should be used to inject the
RTCAudioProcessorDelegate implementation into the system.
Here’s an example of how you can inject the
MyAudioProcessor into the WebRTC system in an iOS application:
By calling the
setupAudioProcessorDelegate function with your audio processor instance, you are now able to modify the audio frames before they are sent to the other peer over the network.
Build WebRTC for Other Platforms
The instructions provided so far are specifically for building WebRTC with the Audio Hook feature for iOS. Krisp Application Engineering team is actively working to implement Audio Hook on Android. The document will be updated once the feature becomes available.
Integrating Krisp into WebRTC can be a powerful solution for web application builders who want to implement the world’s best noise reduction and voice processing technologies in their WebRTC applications. By introducing the Audio Hook and modifying the WebRTC audio stream, you can take advantage of Krisp’s powerful audio processing capabilities to enhance the audio quality of your WebRTC applications.
In this article, we have discussed the challenges of modifying the WebRTC audio stream and introduced the Krisp Audio Hook as a solution. We have provided step-by-step instructions on how to build WebRTC with the Audio Hook modifications for iOS.
We hope this article and the detailed documentation have been helpful and you can now enhance the audio quality of your WebRTC applications and deliver a seamless communication experience for your users.
Try next-level audio and voice technologies
This article was written by Aram Tatalyan, BS in Applied Mathematics and Informatics, Staff Engineer at Krisp.