{"id":9833,"date":"2023-02-13T22:24:22","date_gmt":"2023-02-13T18:24:22","guid":{"rendered":"https:\/\/krisp.ai\/blog\/?p=9833"},"modified":"2025-03-12T11:43:19","modified_gmt":"2025-03-12T07:43:19","slug":"speech-enhancement-review-krisp-use-case","status":"publish","type":"post","link":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/","title":{"rendered":"Speech Enhancement Review: Krisp Use Case"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Imagine you have an important online meeting, and there is a lot of noise around you. Kids are playing, the dog is barking, the washing machine is running, a fan is turned on, there is construction happening nearby, and you need to join a call. More often than not, it is nearly impossible to stop the noise or find a quiet place. In such situations, we need special audio processing technology that can remove background noises to improve the quality of online meetings.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This is one of the best applications of speech enhancement. Here we will discuss speech enhancement technology, give some historical background, review existing approaches, cover the challenges surrounding real-time communication, and explore how Krisp\u2019s speech enhancement algorithm is an ideal solution.\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">First, let\u2019s define Speech Enhancement (SE). It improves the quality of a noisy speech signal by reducing or removing background noises (see Figure 1). The main goal is to improve the perceptual quality and intelligibility of speech distorted by noise.<\/span><\/p>\n<div id=\"attachment_9835\" style=\"width: 709px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-9835\" loading=\"lazy\" class=\" wp-image-9835\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.09.49-380x87.png\" alt=\"Figure 1. Speech enhancement\" width=\"699\" height=\"160\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.09.49-380x87.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.09.49-300x69.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.09.49-768x176.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.09.49-1536x353.png 1536w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.09.49-600x138.png 600w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.09.49.png 1776w\" sizes=\"(max-width: 699px) 100vw, 699px\" \/><\/p>\n<p id=\"caption-attachment-9835\" class=\"wp-caption-text\"><em>Figure 1. Speech enhancement.<\/em><\/p>\n<\/div>\n<p><span style=\"font-weight: 400;\">We sometimes find other terms used interchangeably with speech enhancement, such as noise cancellation (NC), noise reduction, noise suppression, and speech separation.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are lots of applications for speech enhancement algorithms, including:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Voice communication, such as in conferencing apps, mobile phones, voice chats, and others. SE algorithms improve speech intelligibility for speakers in noisy environments, such as restaurants, offices, or crowded streets.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Improving other types of audio processing algorithms by making them more noise-robust. For instance, we can apply speech enhancement prior to passing a signal to systems like speech recognition, speaker identification, speech emotion recognition, voice conversion, etc.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Hearing aids. For those with hearing impairments, speech may be completely inaudible in noisy environments. Reducing noise increases intelligibility.<\/span><\/li>\n<\/ol>\n<h2><b>Traditional approaches<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">The <\/span><a href=\"https:\/\/www.semanticscholar.org\/paper\/Enhancement-and-bandwidth-compression-of-noisy-Lim-Oppenheim\/066779ead800b590b0957aa8c70bc77cc7266fab\"><span style=\"font-weight: 400;\">first results<\/span><\/a><span style=\"font-weight: 400;\"> of research centered around speech enhancement were obtained in the 1970s. Traditional approaches were based on statistical assumptions and mathematical modeling of the problem. Their solutions also depend largely on the application, noise types, acoustic conditions, signal-to-noise ratio, and the number of available microphones (channels). Let\u2019s discuss them in more detail.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In general, we can divide speech enhancement algorithms into two types: <\/span><i><span style=\"font-weight: 400;\">multi-channel<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">single-channel<\/span><\/i><span style=\"font-weight: 400;\"> (mono).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The multi-channel case involves two or more microphones (channels). In this case, the extra channel(s) contain information on the noise signal and can help to reduce the noise signal in the primary channel. An example of such a method is <\/span><a href=\"https:\/\/www.researchgate.net\/publication\/2994278_Adaptive_Noise_Cancelling_Principles_and_Applications\"><i><span style=\"font-weight: 400;\">adaptive noise filtering<\/span><\/i><\/a><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This technique uses a reference signal from the auxiliary (secondary) microphone as an input to an adaptive digital filter, which estimates the noise in the primary signal and cancels it out (see Figure 2). Unlike a fixed filter, the adaptive filter automatically adjusts its <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Impulse_response#:~:text=In%20signal%20processing%20and%20control,response%20to%20some%20external%20change.\"><i><span style=\"font-weight: 400;\">impulse response<\/span><\/i><\/a><span style=\"font-weight: 400;\">. The adjustment is based on the error in the output. Therefore, with the proper adaptive algorithm, the filter can smoothly readjust itself under changing conditions to minimize the error. Examples of adaptive algorithms are least mean squares (LMS) and recursive least squares (RLS). <\/span><\/p>\n<div id=\"attachment_9836\" style=\"width: 439px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-9836\" loading=\"lazy\" class=\" wp-image-9836\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.11.55-380x178.png\" alt=\"Figure 2. SE with adaptive noise filtering block diagram\" width=\"429\" height=\"201\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.11.55-380x178.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.11.55-300x141.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.11.55-768x361.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.11.55-1536x721.png 1536w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.11.55-600x282.png 600w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.11.55.png 2032w\" sizes=\"(max-width: 429px) 100vw, 429px\" \/><\/p>\n<p id=\"caption-attachment-9836\" class=\"wp-caption-text\"><em>Figure 2. SE with adaptive noise filtering block diagram.<\/em><\/p>\n<\/div>\n<p><span style=\"font-weight: 400;\">Another example of multi-channel speech enhancement is <\/span><a href=\"https:\/\/krisp.ai\/blog\/hardware-beamforming-noise-reduction\/\"><i><span style=\"font-weight: 400;\">beamforming<\/span><\/i><\/a><span style=\"font-weight: 400;\">, which uses a microphone array to cancel out signals coming from directions other than the preferred source. Multi-channel speech enhancement can lead to promising results, but it requires several microphones and is technically difficult. <\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span> <span style=\"font-weight: 400;\">On the other hand, single-channel, or monaural speech enhancement, has a significant advantage because we don\u2019t need to set up extra microphone(s). The algorithm takes input from only one microphone, which is a noisy audio signal representing a mixture of speech and noise, in order to remove unwanted noise.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The rest of the article is devoted to the monaural case.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">One of the first results in the monaural case is the <\/span><a href=\"https:\/\/www.semanticscholar.org\/paper\/Suppression-of-acoustic-noise-in-speech-using-Boll\/04d4d26f0866a6e2c16d6666b66f7a67f9f0c526\"><span style=\"font-weight: 400;\">spectral subtraction method<\/span><\/a><span style=\"font-weight: 400;\">. There are various methods for this approach, but this is the idea behind the original method:\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Take the noisy input signal and apply a <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Short-time_Fourier_transform\"><span style=\"font-weight: 400;\">short-time Fourier transform (STFT)<\/span><\/a><span style=\"font-weight: 400;\"> algorithm\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Estimate background noise by averaging the spectral magnitudes of audio segments (frames) without speech\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Subtract noise estimation from spectral magnitudes of noisy frames\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Then, by using the original phases of the noisy frame spectrums, apply an Inverse Short-time Fourier (ISTFT) transform to get an approximated signal of clean speech (see Figure 3).\u00a0<\/span><\/li>\n<\/ul>\n<div id=\"attachment_9837\" style=\"width: 390px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-9837\" loading=\"lazy\" class=\"size-large wp-image-9837\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.13.07-380x111.png\" alt=\"Figure 3: Spectral subtraction block diagram.\" width=\"380\" height=\"111\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.13.07-380x111.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.13.07-300x88.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.13.07-768x225.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.13.07-1536x451.png 1536w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.13.07-600x176.png 600w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-13-at-22.13.07.png 2032w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/p>\n<p id=\"caption-attachment-9837\" class=\"wp-caption-text\"><em>Figure 3: Spectral subtraction block diagram.<\/em><\/p>\n<\/div>\n<p><span style=\"font-weight: 400;\">Another classical solution is the <\/span><a href=\"https:\/\/ieeexplore.ieee.org\/document\/1164453\"><span style=\"font-weight: 400;\">minimum mean-square error<\/span><\/a><span style=\"font-weight: 400;\"> (MMSE) algorithm introduced by Ephraim and Malah.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With the rise of machine learning (ML), several solutions have also been proposed using ML-based traditional approaches such as <\/span><a href=\"https:\/\/asu.pure.elsevier.com\/en\/publications\/hmm-based-speech-enhancement-using-harmonic-modeling\"><span style=\"font-weight: 400;\">Hidden Markov Models<\/span><\/a><span style=\"font-weight: 400;\"> (HMM), <\/span><a href=\"https:\/\/www.inf.uni-hamburg.de\/en\/inst\/ab\/sp\/publications\/paper-for-conf-pdf\/2011-mohammadiha-gerkmann-leijon-ieee-international-symposium-on-intelligent-signal-processing-and-communication-systems-chiangmai-thailand-dec2011.pdf\"><span style=\"font-weight: 400;\">non-negative matrix factorization<\/span><\/a><span style=\"font-weight: 400;\"> (NMF), and <\/span><a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S1877050915014234\"><span style=\"font-weight: 400;\">wavelet transform<\/span><\/a><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To understand the limitations of these traditional approaches, note that we can divide noise signals into two categories: <\/span><i><span style=\"font-weight: 400;\">stationary<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">non-stationary<\/span><\/i><span style=\"font-weight: 400;\">. Stationary noises have a simpler structure. Their characteristics are mainly constant over time, such as fan noise, white noise, wind noise, and river sound. Non-stationary noises have time-varying characteristics and are more widespread in real-life. They include traffic noises, construction noises, keyboard typing, cafeteria sounds, crowd noises, babies crying, clapping, animal sounds, and more. The traditional algorithms can effectively suppress stationary noises, but they have little to no effect when suppressing more-challenging non-stationary noises.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Recent advances in computer hardware and ML have led to increased research and industrial applications of algorithms based on deep learning methods, such as <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Artificial_neural_network\"><span style=\"font-weight: 400;\">artificial neural networks<\/span><\/a><span style=\"font-weight: 400;\"> (NN).\u00a0 Starting in the 2010s, neural network algorithms made tremendous progress in natural language, image, and audio processing spheres. These systems outperform traditional approaches <\/span><a href=\"https:\/\/krisp.ai\/blog\/speech-quality-measurement\/\"><span style=\"font-weight: 400;\">in terms of evaluation scores<\/span><\/a><span style=\"font-weight: 400;\">. 2015 saw <\/span><a href=\"https:\/\/dl.acm.org\/doi\/10.1109\/TASLP.2014.2364452\"><span style=\"font-weight: 400;\">the first results of speech enhancement via deep learning<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<h2><b>Deep learning approach<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Figure 4 is a typical block diagram representing monaural speech enhancement using deep learning methods.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The goal is generally as follows:\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Given an arbitrary noisy signal consisting of arbitrary noise and speech signals, create a deep learning model that will reduce or entirely remove the noise signal while preserving the speech signal without any audible distortion.\u00a0<\/span><\/p>\n<div id=\"attachment_9838\" style=\"width: 486px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-9838\" loading=\"lazy\" class=\" wp-image-9838\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-4-380x138.png\" alt=\"Figure 4: SE using deep learning, a block diagram.\" width=\"476\" height=\"173\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-4-380x138.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-4-300x109.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-4-768x278.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-4-600x217.png 600w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-4.png 1176w\" sizes=\"(max-width: 476px) 100vw, 476px\" \/><\/p>\n<p id=\"caption-attachment-9838\" class=\"wp-caption-text\"><em>Figure 4: SE using deep learning, a block diagram.<\/em><\/p>\n<\/div>\n<p><span style=\"font-weight: 400;\">Let\u2019s go over the main steps of this approach.\u00a0<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Training data<\/b><span style=\"font-weight: 400;\">: deep learning is a data-driven approach, so the end quality of the model greatly depends on the quality and amount of training data. In the case of speech enhancement, the raw training data is an audio set consisting of noisy and clean speech samples. To obtain such data we need to collect a <\/span><i><span style=\"font-weight: 400;\">clean speech dataset<\/span><\/i><span style=\"font-weight: 400;\"> and a <\/span><i><span style=\"font-weight: 400;\">noise dataset<\/span><\/i><span style=\"font-weight: 400;\">. Then, by mixing clean speech and noise signals, we can artificially generate noisy\/clean speech pairs as the model\u2019s input\/output data points. These are the most important aspects of data quality:<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">A clean speech dataset should not contain any audible background noises<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Training voices and noises should be diverse to help the model generalize on unseen voices and noises<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">It\u2019s preferable that samples are from high-quality microphones because this gives more flexibility in data augmentations<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p><b>2. Feature extraction<\/b><span style=\"font-weight: 400;\">: An example of reasonable feature extraction is an <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Spectrogram\"><span style=\"font-weight: 400;\">audio spectrogram<\/span><\/a><span style=\"font-weight: 400;\"> or spectrogram-based features like <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Mel-frequency_cepstrum\"><span style=\"font-weight: 400;\">Mel-frequency cepstral coefficients (MFCCs)<\/span><\/a><span style=\"font-weight: 400;\">, which is a time and frequency representation of the signal that reflects the human auditory system&#8217;s response. As shown in Figure 5, we can visualize spectrograms as a color map of power spectrum values for time and frequency dimensions, where lighter colors mean higher values in Hz and vice versa.<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<div id=\"attachment_9839\" style=\"width: 390px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-9839\" loading=\"lazy\" class=\"size-large wp-image-9839\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-5-380x214.png\" alt=\"Figure 5: Example of speech spectrogram.\" width=\"380\" height=\"214\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-5-380x214.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-5-300x169.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-5-768x432.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-5-1536x865.png 1536w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-5-600x338.png 600w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Figure-5.png 1600w\" sizes=\"(max-width: 380px) 100vw, 380px\" \/><\/p>\n<p id=\"caption-attachment-9839\" class=\"wp-caption-text\"><em>Figure 5: Example of speech spectrogram.<\/em><\/p>\n<\/div>\n<p><b>3. Neural Network<\/b><span style=\"font-weight: 400;\">: We can tune almost any type of neural network architecture for speech enhancement. We then treat spectrograms as images in order to use image processing techniques, such as <\/span><a href=\"https:\/\/www.isca-speech.org\/archive\/interspeech_2017\/park17c_interspeech.html\"><span style=\"font-weight: 400;\">convolutional networks<\/span><\/a><span style=\"font-weight: 400;\">. We can also represent audio as sequential data,\u00a0 meaning that recurrent neural networks can be a proper choice in that case. This is particularly true for gated recurrent units (GRU) and <\/span><a href=\"https:\/\/ieeexplore.ieee.org\/document\/8461861\"><span style=\"font-weight: 400;\">long short-term memory units<\/span><\/a><span style=\"font-weight: 400;\"> (LSTM).\u00a0<\/span><\/p>\n<p><b>4. Training<\/b><span style=\"font-weight: 400;\">: During the training stage, the model \u201clearns\u201d generic patterns of clean speech spectrums and noise spectrums to distinguish between speech and noise. This ultimately enables it to recover the speech spectrum from the noisy\/corrupted input.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">After the training stage, we can use the model for inference. It takes noisy audio input, extracts features, passes it to the neural network, obtains the clean speech features, and, during post-processing, recovers the clean speech signal in the output. Studies show that speech enhancement models based on deep learning are superior to traditional approaches and show significant noise reduction, not only in the case of stationary noises but also in non-stationary ones.<\/span><span style=\"font-weight: 400;\">\u00a0\u00a0<\/span><\/p>\n<h2><b>Krisp Noise Cancellation<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Each use case dictates the SE algorithm\u2019s specific requirements. In the case of <\/span><a href=\"http:\/\/krisp.ai\"><span style=\"font-weight: 400;\">Krisp<\/span><\/a><span style=\"font-weight: 400;\">, our mission is to provide an on-device, real-time experience to users all over the world. That\u2019s why the model works on small chunks of the audio signal without introducing any noticeable latency and has small enough <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/FLOPS\"><span style=\"font-weight: 400;\">FLOPs<\/span><\/a><span style=\"font-weight: 400;\"> to consume a reasonable amount of computational resources. To achieve this goal, we use custom neural network architecture and digital signal processing algorithms in the pre and post-processing stages. Our training dataset includes several thousand hours of clean speech and noise. During the training stage, we also apply various data augmentations to cover microphone diversity, acoustic conditions, signal-to-noise ratios (SNR), bandwidths, and other factors.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">We\u2019ve achieved algorithmic latency of less than 20 ms, much less than the <\/span><a href=\"https:\/\/www.itu.int\/rec\/T-REC-G.114-200305-I\/en\"><span style=\"font-weight: 400;\">recommended maximum real-time latency of 200ms<\/span><\/a><span style=\"font-weight: 400;\">. Our evaluations and <\/span><a href=\"https:\/\/krisp.ai\/blog\/krisp-noise-cancellation-comparison\/\"><span style=\"font-weight: 400;\">comparisons between our algorithms and other speech enhancement technologies<\/span><\/a><span style=\"font-weight: 400;\"> show superior results, both in terms of the quality of preserved voice and the amount of eliminated noise.<\/span><\/p>\n<h2><strong>Try next-level audio and voice technologies \u00a0<\/strong><\/h2>\n<p><a href=\"https:\/\/krisp.ai\/blog\/voice-communication-quality-with-krisp-sdk\/\" target=\"_blank\" rel=\"noopener\">Krisp licenses its SDKs<\/a>\u00a0to embed directly into applications and devices. <a href=\"https:\/\/krisp.ai\/developers\/\" target=\"_blank\" rel=\"noopener\">Learn more about Krisp&#8217;s SDKs<\/a> and begin your evaluation today.<\/p>\n<p><a href=\"https:\/\/krisp.ai\/developers\/\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" class=\"alignnone size-full wp-image-9589\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2022\/09\/engineering-blog-cta.png\" alt=\"krisp sdk\" width=\"1280\" height=\"720\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2022\/09\/engineering-blog-cta.png 1280w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2022\/09\/engineering-blog-cta-300x169.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2022\/09\/engineering-blog-cta-380x214.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2022\/09\/engineering-blog-cta-768x432.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\" \/><\/a><\/p>\n<hr \/>\n<p>This article was written by:<br \/>\nDr. Stepan Sargsyan, PhD in Mathematical Analysis. Dr. Sargsyan is an ML Architect at Krisp.<\/p>\n<hr \/>\n<h2><b>References<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">[1] Lim, J. and Oppenheim, A. V. (1979), Enhancement and bandwidth compression of<\/span><\/p>\n<p><span style=\"font-weight: 400;\">noisy speech, Proc. IEEE, 67(12), 1586\u20131604.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[2] <\/span><span style=\"font-weight: 400;\">B. Widrow <\/span><i><span style=\"font-weight: 400;\">et al<\/span><\/i><span style=\"font-weight: 400;\">., &#8220;Adaptive noise cancelling: Principles and applications,&#8221; in <\/span><i><span style=\"font-weight: 400;\">Proceedings of the IEEE<\/span><\/i><span style=\"font-weight: 400;\">, vol. 63, no. 12, pp. 1692-1716<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[3] <\/span><span style=\"font-weight: 400;\">S. Boll, &#8220;Suppression of acoustic noise in speech using spectral subtraction,&#8221; in <\/span><i><span style=\"font-weight: 400;\">IEEE Transactions on Acoustics, Speech, and Signal Processing<\/span><\/i><span style=\"font-weight: 400;\">, vol. 27, no. 2, pp. 113-120, April 1979<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[4] Y. Ephraim and D. Malah, &#8220;Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,&#8221; in <\/span><i><span style=\"font-weight: 400;\">IEEE Transactions on Acoustics, Speech, and Signal Processing<\/span><\/i><span style=\"font-weight: 400;\">, vol. 32, no. 6, pp. 1109-1121, December 1984<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[5] <\/span><span style=\"font-weight: 400;\">M. E. Deisher and A. S. Spanias, &#8220;HMM-based speech enhancement using harmonic modeling,&#8221; <\/span><i><span style=\"font-weight: 400;\">1997 IEEE International Conference on Acoustics, Speech, and Signal Processing<\/span><\/i><span style=\"font-weight: 400;\">, 1997, pp. 1175-1178 vol.2<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[6] N. Mohammadiha, T. Gerkmann and A. Leijon, &#8220;A new approach for speech enhancement based on a constrained Nonnegative Matrix Factorization,&#8221; <\/span><i><span style=\"font-weight: 400;\">2011 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS)<\/span><\/i><span style=\"font-weight: 400;\">, 2011, pp. 1-5<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[7] R. Patil, \u201cNoise Reduction using Wavelet Transform and Singular Vector Decomposition\u201d, Procedia Computer Science, vol. 54, 2015, pp 849-853,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[8] <\/span><span style=\"font-weight: 400;\">Y. Xu, J. Du, L. -R. Dai and C. -H. Lee, &#8220;A Regression Approach to Speech Enhancement Based on Deep Neural Networks,&#8221; in <\/span><i><span style=\"font-weight: 400;\">IEEE\/ACM Transactions on Audio, Speech, and Language Processing<\/span><\/i><span style=\"font-weight: 400;\">, vol. 23, no. 1, pp. 7-19, Jan. 2015<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[9] Y. Zhao, B. Xu, R. Giri, and T. Zhang, &#8220;Perceptually guided speech enhancement using deep neural networks,&#8221; in 2018 IEEE Int. Conf. Acoustics Speech and Signal Processing Proc., 2018, pp. 5074-5078.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[10] T. Gao, J. Du, L. R. Dai, and C. H. Lee, &#8220;Densely connected progressive learning for LSTM-Based speech enhancement,&#8221; in 2018 IEEE Int. Conf. Acoustics Speech and Signal Processing Proc., 2018, pp. 5054-5058.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[11] S. R. Park and J. W. Lee, &#8220;A fully convolutional neural network for speech enhancement,&#8221; in Proc. Annu. Conf. Speech Communication Association Interspeech 2017.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[12] A. Pandey and D. Wang, &#8220;A new framework for CNN-Based speech enhancement in the time domain,&#8221; IEEE\/ACM Trans. Audio Speech Lang. Process., vol. 27, July. 2019.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[13] F. G. Germain, Q. Chen, and V. Koltun, &#8220;Speech denoising with deep feature losses,&#8221; in Proc. Annu. Conf. Speech Communication Association Interspeech, 2019.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[14] D. Baby and S. Verhulst, &#8220;Sergan: Speech enhancement using relativistic generative adversarial networks with gradient penalty,&#8221; in 2019 IEEE Int. Conf. Acoustics, Speech and Signal Processing Proc., 2019.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[15] H. Phan et al., &#8220;Improving GANs for speech enhancement,&#8221; IEEE Signal Process. Lett., vol. 27, 2020.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[16] P. Karjol, M. A. Kumar, and P. K. Ghosh, &#8220;Speech enhancement using multiple deep neural networks,&#8221; in 2018 IEEE Int. Conf. Acoustics, Speech and Signal Processing Proc., 2018.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[17] H. Zhao, S. Zarar, I. Tashev, and C. -H. Lee, &#8220;Convolutional-Recurrent Neural Networks for Speech Enhancement,&#8221; 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 2401-2405.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">[18] ITU-T G.114: <\/span><a href=\"https:\/\/www.itu.int\/rec\/T-REC-G.114-200305-I\/en\"><span style=\"font-weight: 400;\">https:\/\/www.itu.int\/rec\/T-REC-G.114-200305-I\/en<\/span><\/a><span style=\"font-weight: 400;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Imagine you have an important online meeting, and there is a lot of noise around you. Kids are playing, the dog is barking, the washing machine is running, a fan is turned on, there is construction happening nearby, and you need to join a call. More often than not, it is nearly impossible to stop [&hellip;]<\/p>\n","protected":false},"author":65,"featured_media":9834,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"two_page_speed":[]},"categories":[421,414],"tags":[53,48,424],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.2 (Yoast SEO v23.6) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Speech Enhancement Review: Krisp Use Case - Krisp<\/title>\n<meta name=\"description\" content=\"Speech enhancement review \u2013 Krisp use case: Learn how Krisp\u2019s AI-powered noise cancellation transforms speech clarity for calls, meetings, and recordings.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Speech Enhancement Review: Krisp Use Case - Krisp\" \/>\n<meta property=\"og:description\" content=\"Speech enhancement review \u2013 Krisp use case: Learn how Krisp\u2019s AI-powered noise cancellation transforms speech clarity for calls, meetings, and recordings.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/\" \/>\n<meta property=\"og:site_name\" content=\"Krisp\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/krispHQ\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-02-13T18:24:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-03-12T07:43:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2000\" \/>\n\t<meta property=\"og:image:height\" content=\"1400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Krisp Research Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@krispHQ\" \/>\n<meta name=\"twitter:site\" content=\"@krispHQ\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/\"},\"author\":{\"name\":\"Krisp Research Team\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1\"},\"headline\":\"Speech Enhancement Review: Krisp Use Case\",\"datePublished\":\"2023-02-13T18:24:22+00:00\",\"dateModified\":\"2025-03-12T07:43:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/\"},\"wordCount\":2069,\"commentCount\":9,\"publisher\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png\",\"keywords\":[\"Krisp\",\"noise cancellation\",\"speech enhancement\"],\"articleSection\":[\"Engineering Blog\",\"Noise Cancellation\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/\",\"url\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/\",\"name\":\"Speech Enhancement Review: Krisp Use Case - Krisp\",\"isPartOf\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png\",\"datePublished\":\"2023-02-13T18:24:22+00:00\",\"dateModified\":\"2025-03-12T07:43:19+00:00\",\"description\":\"Speech enhancement review \u2013 Krisp use case: Learn how Krisp\u2019s AI-powered noise cancellation transforms speech clarity for calls, meetings, and recordings.\",\"breadcrumb\":{\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage\",\"url\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png\",\"contentUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png\",\"width\":2000,\"height\":1400,\"caption\":\"Speech Enhancement Review: Krisp Use Case\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/krisp.ai\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Speech Enhancement Review: Krisp Use Case\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/krisp.ai\/blog\/#website\",\"url\":\"https:\/\/krisp.ai\/blog\/\",\"name\":\"Krisp\",\"description\":\"Blog\",\"publisher\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/krisp.ai\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\",\"name\":\"Krisp\",\"url\":\"https:\/\/krisp.ai\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png\",\"contentUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png\",\"width\":696,\"height\":696,\"caption\":\"Krisp\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/krispHQ\/\",\"https:\/\/x.com\/krispHQ\",\"https:\/\/www.linkedin.com\/company\/krisphq\/\",\"https:\/\/www.youtube.com\/channel\/UCAMZinJdR9P33fZUNpuxXtg\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1\",\"name\":\"Krisp Research Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g\",\"caption\":\"Krisp Research Team\"},\"url\":\"https:\/\/krisp.ai\/blog\/author\/research-team\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Speech Enhancement Review: Krisp Use Case - Krisp","description":"Speech enhancement review \u2013 Krisp use case: Learn how Krisp\u2019s AI-powered noise cancellation transforms speech clarity for calls, meetings, and recordings.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/","og_locale":"en_US","og_type":"article","og_title":"Speech Enhancement Review: Krisp Use Case - Krisp","og_description":"Speech enhancement review \u2013 Krisp use case: Learn how Krisp\u2019s AI-powered noise cancellation transforms speech clarity for calls, meetings, and recordings.","og_url":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/","og_site_name":"Krisp","article_publisher":"https:\/\/www.facebook.com\/krispHQ\/","article_published_time":"2023-02-13T18:24:22+00:00","article_modified_time":"2025-03-12T07:43:19+00:00","og_image":[{"width":2000,"height":1400,"url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png","type":"image\/png"}],"author":"Krisp Research Team","twitter_card":"summary_large_image","twitter_creator":"@krispHQ","twitter_site":"@krispHQ","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#article","isPartOf":{"@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/"},"author":{"name":"Krisp Research Team","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1"},"headline":"Speech Enhancement Review: Krisp Use Case","datePublished":"2023-02-13T18:24:22+00:00","dateModified":"2025-03-12T07:43:19+00:00","mainEntityOfPage":{"@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/"},"wordCount":2069,"commentCount":9,"publisher":{"@id":"https:\/\/krisp.ai\/blog\/#organization"},"image":{"@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage"},"thumbnailUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png","keywords":["Krisp","noise cancellation","speech enhancement"],"articleSection":["Engineering Blog","Noise Cancellation"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/","url":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/","name":"Speech Enhancement Review: Krisp Use Case - Krisp","isPartOf":{"@id":"https:\/\/krisp.ai\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage"},"image":{"@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage"},"thumbnailUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png","datePublished":"2023-02-13T18:24:22+00:00","dateModified":"2025-03-12T07:43:19+00:00","description":"Speech enhancement review \u2013 Krisp use case: Learn how Krisp\u2019s AI-powered noise cancellation transforms speech clarity for calls, meetings, and recordings.","breadcrumb":{"@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#primaryimage","url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png","contentUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/02\/Speech-Enhancement.png","width":2000,"height":1400,"caption":"Speech Enhancement Review: Krisp Use Case"},{"@type":"BreadcrumbList","@id":"https:\/\/krisp.ai\/blog\/speech-enhancement-review-krisp-use-case\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/krisp.ai\/blog\/"},{"@type":"ListItem","position":2,"name":"Speech Enhancement Review: Krisp Use Case"}]},{"@type":"WebSite","@id":"https:\/\/krisp.ai\/blog\/#website","url":"https:\/\/krisp.ai\/blog\/","name":"Krisp","description":"Blog","publisher":{"@id":"https:\/\/krisp.ai\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/krisp.ai\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/krisp.ai\/blog\/#organization","name":"Krisp","url":"https:\/\/krisp.ai\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png","contentUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png","width":696,"height":696,"caption":"Krisp"},"image":{"@id":"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/krispHQ\/","https:\/\/x.com\/krispHQ","https:\/\/www.linkedin.com\/company\/krisphq\/","https:\/\/www.youtube.com\/channel\/UCAMZinJdR9P33fZUNpuxXtg"]},{"@type":"Person","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1","name":"Krisp Research Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g","caption":"Krisp Research Team"},"url":"https:\/\/krisp.ai\/blog\/author\/research-team\/"}]}},"primary_category":"Engineering Blog","_links":{"self":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/9833"}],"collection":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/users\/65"}],"replies":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/comments?post=9833"}],"version-history":[{"count":5,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/9833\/revisions"}],"predecessor-version":[{"id":9844,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/9833\/revisions\/9844"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/media\/9834"}],"wp:attachment":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/media?parent=9833"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/categories?post=9833"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/tags?post=9833"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}