


{"id":10005,"date":"2023-07-06T10:24:19","date_gmt":"2023-07-06T06:24:19","guid":{"rendered":"https:\/\/krisp.ai\/blog\/?p=10005"},"modified":"2024-03-18T14:36:16","modified_gmt":"2024-03-18T10:36:16","slug":"can-you-hear-a-room","status":"publish","type":"post","link":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/","title":{"rendered":"Can You Hear a Room?"},"content":{"rendered":"<h2><strong>Introduction<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Sound propagation has distinctive features associated with the environment where it happens. Human ears can often clearly distinguish whether a given sound recording was produced in a small room, large room, or outdoors. One can even get a sense of a direction or a distance from the sound source by listening to a recording. These characteristics are defined by the objects around the listener or a recording microphone such as the size and material of walls in a room, furniture, people, etc. Every object has its own sound reflection, absorption, and diffraction properties, and all of them together define the way a sound propagates, reflects, attenuates, and reaches the listener.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">In acoustic signal processing, one often needs a way to model the sound field in a room with certain characteristics, in order to reproduce a sound in that specific setting, so to speak. Of course, one could simply go to that room, reproduce the required sound and record it with a microphone. However, in many cases, this is inconvenient or even infeasible.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">For example, suppose we want to build a Deep Neural Net (DNN)-based voice assistant in a device with a microphone that receives pre-defined voice commands and performs actions accordingly. We need to make our DNN model robust to various room conditions. To this end, we could arrange many rooms with various conditions, reproduce\/record our commands in those rooms, and feed the obtained data to our model. Now, if we decide to add a new command, we would have to do all this work once again. Other examples are Virtual Reality (VR) applications or architectural planning of buildings where we need to model the acoustic environment in places that simply do not exist in reality.\u00a0\u00a0\u00a0\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">In the case of our voice assistant, it would be beneficial to be able to encode and digitally record the acoustic properties of a room in some way so that we could take any sound recording and \u201cembed\u201d it in the room by using the room \u201cencoding\u201d. This would free us from physically accessing the room every time we need it. In the case of VR or architectural planning applications, the goal then would be to digitally generate a room\u2019s encoding only based on its desired physical dimensions and the materials and objects contained in it.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Thus, we are looking for a way to capture the acoustic properties of a room in a digital record, so that we can reproduce any given audio recording as if it was played in that room. This would be a digital acoustic model of the room representing its geometry, materials, and other things that make us \u201chear a room\u201d in a certain sense.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2><strong>What is RIR?<\/strong><\/h2>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Room impulse response (RIR for short) is something that does capture room acoustics, to a large extent. A room with a given sound source and a receiver can be thought of as a black-box system. Upon receiving on its input a sound signal emitted by the source, the system transforms it and outputs whatever is received at the receiver. The transformation corresponds to the reflections, scattering, diffraction, attenuation and other effects that the signal undergoes before reaching the receiver. Impulse response describes such systems under the assumption of <\/span><i><span style=\"font-weight: 400;\">time-invariance<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">linearity<\/span><\/i><span style=\"font-weight: 400;\">. In the case of RIR, time-invariance means that the room is in a steady state, i.e, the acoustic conditions do not change over time. For example, a room with people moving around, or a room where the outside noise can be heard, is not time invariant since the acoustic conditions change with time. Linearity means that if the input signal is a scaled superposition of two other signals, <\/span><i><span style=\"font-weight: 400;\">x<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">y<\/span><\/i><span style=\"font-weight: 400;\">, then the output signal is a similarly scaled superposition of the output signals corresponding to <\/span><i><span style=\"font-weight: 400;\">x<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">y,<\/span><\/i><span style=\"font-weight: 400;\"> individually. Linearity holds with sufficient fidelity in most practical acoustic environments (while time-invariance can be achieved in a controlled environment).<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Let us take a digital approximation of a sound signal. It is a sequence of discrete samples, as shown in Fig. 1.<\/span><\/p>\n<div id=\"attachment_10006\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10006\" loading=\"lazy\" class=\"wp-image-10006\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-1.png\" alt=\"Sound wave form\" width=\"800\" height=\"322\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-1.png 949w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-1-300x121.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-1-380x153.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-1-768x309.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-1-600x242.png 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<p id=\"caption-attachment-10006\" class=\"wp-caption-text\">Fig. 1 The waveform of a sound signal.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Each sample is a positive or negative number that corresponds to the degree of instantaneous excitation of the sound source, e.g., a loudspeaker membrane, as measured at discrete time steps. It can be viewed as an extremely short sound, or an <\/span><i><span style=\"font-weight: 400;\">impulse<\/span><\/i><span style=\"font-weight: 400;\">. The signal can thus be approximately viewed as a sequence of scaled impulses. Now, given time-invariance and linearity of the system, some mathematics shows that the effect of a room-source-receiver system on an audio signal can be completely described by its effect on a single impulse, which is usually referred to as\u00a0 an <\/span><i><span style=\"font-weight: 400;\">impulse response<\/span><\/i><span style=\"font-weight: 400;\">. More concretely, impulse response is a function <\/span><i><span style=\"font-weight: 400;\">h(t)<\/span><\/i><span style=\"font-weight: 400;\"> of time <\/span><i><span style=\"font-weight: 400;\">t &gt; 0<\/span><\/i><span style=\"font-weight: 400;\"> (response to a unit impulse at time <\/span><i><span style=\"font-weight: 400;\">t = 0<\/span><\/i><span style=\"font-weight: 400;\">) such that for an input sound signal <\/span><i><span style=\"font-weight: 400;\">x(t)<\/span><\/i><span style=\"font-weight: 400;\">, the system\u2019s output is given by the <\/span><i><span style=\"font-weight: 400;\">convolution<\/span><\/i><span style=\"font-weight: 400;\"> between the input and the impulse response. This is a mathematical operation that, informally speaking, produces a weighted sum of the delayed versions of the input signal where weights are defined by the impulse response. This reflects the intuitive fact that the received signal at time <\/span><i><span style=\"font-weight: 400;\">t<\/span><\/i><span style=\"font-weight: 400;\"> is a combination of delayed and attenuated values of the original signal up to time <\/span><i><span style=\"font-weight: 400;\">t<\/span><\/i><span style=\"font-weight: 400;\">, corresponding to reflections from walls and other objects, as well as scattering, attenuation and other acoustic effects.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">For example, in the recordings below, one can see the RIR recorded by a clapping sound (see below), an anechoic recording of singing, and their convolution.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">RIR<\/span><\/p>\n<p><!--[if lt IE 9]><script>document.createElement('audio');<\/script><![endif]--><br \/>\n<audio class=\"wp-audio-shortcode\" id=\"audio-10005-1\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/clap_rir.wav?_=1\" \/><a href=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/clap_rir.wav\">https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/clap_rir.wav<\/a><\/audio><\/p>\n<p>Singing anechoic<\/p>\n<p><audio class=\"wp-audio-shortcode\" id=\"audio-10005-2\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/singing_anechoic.wav?_=2\" \/><a href=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/singing_anechoic.wav\">https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/singing_anechoic.wav<\/a><\/audio><\/p>\n<p>Singing with RIR<\/p>\n<p><audio class=\"wp-audio-shortcode\" id=\"audio-10005-3\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/wav\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/singing_rir.wav?_=3\" \/><a href=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/singing_rir.wav\">https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/singing_rir.wav<\/a><\/audio><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">It is often useful to consider sound signals in the <\/span><i><span style=\"font-weight: 400;\">frequency domain, <\/span><\/i><span style=\"font-weight: 400;\">as opposed to the time domain. It is known from Fourier analysis that every well-behaved periodic function can be expressed as a sum (infinite, in general) of scaled sinusoids. The sequence of the (complex) coefficients of sinusoids within the sum, the <\/span><i><span style=\"font-weight: 400;\">Fourier coefficients<\/span><\/i><span style=\"font-weight: 400;\">, provides another, yet equivalent representation of the function. In other words, a sound signal can be viewed as a superposition of sinusoidal sound waves or <\/span><i><span style=\"font-weight: 400;\">tones<\/span><\/i><span style=\"font-weight: 400;\"> of different frequencies, and the Fourier coefficients show the contribution of each frequency in the signal. For finite sequences such as digital audio, that are of practical interest, such decompositions into periodic waves can be efficiently computed via the <\/span><i><span style=\"font-weight: 400;\">Fast Fourier Transform (FFT)<\/span><\/i><span style=\"font-weight: 400;\">.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">For non-stationary signals such as speech and music, it is more instructive to do analysis using the short-time<\/span><i><span style=\"font-weight: 400;\"> Fourier transform (STFT)<\/span><\/i><span style=\"font-weight: 400;\">. Here, we split the signal into short equal-length segments and compute the Fourier transform for each segment. This shows how the frequency content of the signal evolves with time (see Fig. 2). That is, while the signal waveform and Fourier transform give us only time and only frequency information about the signal (although one being recoverable from another), the STFT provides something in between.<\/span><\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_10007\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10007\" loading=\"lazy\" class=\"wp-image-10007\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-2.png\" alt=\"Spectrogram of a speech signal.\" width=\"800\" height=\"236\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-2.png 1550w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-2-300x89.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-2-380x112.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-2-768x227.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-2-1536x454.png 1536w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-2-600x177.png 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<p id=\"caption-attachment-10007\" class=\"wp-caption-text\">Fig. 2 Spectrogram of a speech signal.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">A visual representation of an STFT, such as the one in Fig. 2, is called a <\/span><i><span style=\"font-weight: 400;\">spectrogram.<\/span><\/i><span style=\"font-weight: 400;\"> The horizontal and vertical axes show time and frequency, respectively, while the color intensity represents the magnitude of the corresponding Fourier coefficient on a <\/span><i><span style=\"font-weight: 400;\">logarithmic<\/span><\/i><span style=\"font-weight: 400;\"> scale (the brighter the color, the larger is the magnitude of the frequency at the given time).<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Measurement and Structure of RIR<\/strong><\/h2>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">In theory, the impulse response of a system can be measured by feeding it a unit impulse and recording whatever comes at the output with a microphone. Still, in practice, we cannot produce an instantaneous and powerful audio signal. Instead, one could record RIR approximately by using short impulsive sounds. One could use a clapping sound, a starter gun, a balloon popping sound, or the sound of an electric spark discharge.<\/span><\/p>\n<div id=\"attachment_10008\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10008\" loading=\"lazy\" class=\"wp-image-10008\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-3.png\" alt=\"waveform of a RIR\" width=\"800\" height=\"330\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-3.png 1844w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-3-300x124.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-3-380x157.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-3-768x317.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-3-1536x633.png 1536w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-3-600x247.png 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<p id=\"caption-attachment-10008\" class=\"wp-caption-text\">Fig. 3 The spectrogram and waveform of a RIR produced by a clapping sound.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The results of such measurements (see, for example, Fig. 3) may be not sufficiently accurate for a particular application, due to the error introduced by the structure of the input signal. An ideal impulse, in some mathematical sense, has a <\/span><i><span style=\"font-weight: 400;\">flat spectrum<\/span><\/i><span style=\"font-weight: 400;\">, that is, it contains all frequencies with equal magnitude. The impulsive sounds above usually significantly deviate from this property. Measurements with such signals may also be poorly reproducible. Alternatively, a digitally created impulsive sound with desired characteristics could be played with a loudspeaker, but the power of the signal would still be limited by speaker characteristics. Among other limitations of measurements with impulsive sounds are: particular sensitivity to external noise (from outside the room), sensitivity to nonlinear effects of the recording microphone or emitting speaker, and the directionality of the sound source.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Fortunately, there are more robust methods of measuring room impulse response. The main idea behind these techniques is to play a <\/span><i><span style=\"font-weight: 400;\">transformed<\/span><\/i><span style=\"font-weight: 400;\"> impulsive sound with a speaker, record the output, and apply an inverse transform to recover impulse response. The rationale is, since we cannot play an impulse as it is with sufficient power, we \u201cspread\u201d its power across time, so to speak, while maintaining the flat spectrum property over a useful range of frequencies. An example of such a \u201cstretched impulse\u201d is shown in Fig. 4.<\/span><\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_10009\" style=\"width: 620px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10009\" loading=\"lazy\" class=\"size-full wp-image-10009\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-4.png\" alt=\"streched impulse\" width=\"610\" height=\"373\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-4.png 610w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-4-300x183.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-4-380x232.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-4-600x367.png 600w\" sizes=\"(max-width: 610px) 100vw, 610px\" \/><\/p>\n<p id=\"caption-attachment-10009\" class=\"wp-caption-text\">Fig. 4 A \u201cstretched\u201d impulse.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Other variants of such signals are Maximum Length Sequences and Exponential Sine Sweep. An advantage of measurement with such non-localized and reproducible test signals is that ambient noise and microphone nonlinearities can be effectively averaged out. There are also some technicalities that need to be dealt with. For example, the need for synchronization of emitting and recording ends, ensuring that the test signal covers the whole length of impulse response, and the need for <\/span><i><span style=\"font-weight: 400;\">deconvolution,<\/span><\/i><span style=\"font-weight: 400;\"> that is applying an inverse transform for recovering the impulse response.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The waveform on Fig. 5 shows another measured RIR. The initial spike at 0-3 ms corresponds to the <\/span><i><span style=\"font-weight: 400;\">direct sound<\/span><\/i><span style=\"font-weight: 400;\"> that has arrived to the microphone along a direct path. The smaller spikes following it and starting from about 3-5 ms from the first spike clearly show several <\/span><i><span style=\"font-weight: 400;\">early specular reflections<\/span><\/i><span style=\"font-weight: 400;\">. After about 80 ms there are no distinctive specular reflections left, and what we see is the <\/span><i><span style=\"font-weight: 400;\">late reverberation<\/span><\/i><span style=\"font-weight: 400;\"> or the <\/span><i><span style=\"font-weight: 400;\">reverberant tail<\/span><\/i><span style=\"font-weight: 400;\"> of the RIR.\u00a0\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_10010\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10010\" loading=\"lazy\" class=\"wp-image-10010\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-5.png\" alt=\"room impulse response\" width=\"800\" height=\"448\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-5.png 853w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-5-300x168.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-5-380x213.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-5-768x430.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-5-600x336.png 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<p id=\"caption-attachment-10010\" class=\"wp-caption-text\">Fig. 5 A room impulse response. Time is shown in seconds.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">While the spectrogram of RIR seems not very insightful apart from the remarks so far, there is some information one can extract from it. It shows, in particular, how the intensity of different frequencies decreases with time due to losses. For example, it is known that intensity loss due to <\/span><i><span style=\"font-weight: 400;\">air absorption (attenuation)<\/span><\/i><span style=\"font-weight: 400;\"> is stronger for higher frequencies. At low frequencies, the spectrogram may exhibit distinct persistent frequency bands, <\/span><i><span style=\"font-weight: 400;\">room modes<\/span><\/i><span style=\"font-weight: 400;\">, that correspond to <\/span><i><span style=\"font-weight: 400;\">standing waves<\/span><\/i><span style=\"font-weight: 400;\"> in the room. This effect can be seen below a certain frequency threshold depending on the room geometry, the <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Room_acoustics\"><i><span style=\"font-weight: 400;\">Schroeder frequency<\/span><\/i><\/a><span style=\"font-weight: 400;\">, which for most rooms is <\/span><i><span style=\"font-weight: 400;\">20 &#8211; 250 Hz<\/span><\/i><span style=\"font-weight: 400;\">. Those modes are visible due to the lower density of resonant frequencies of the room near the bottom of the spectrum, with wavelength comparable to the room dimensions. At higher frequencies, modes overlap more and more and are not distinctly visible.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">RIR can also be used to estimate certain parameters associated with a room, the most well-known of them being the <\/span><i><span style=\"font-weight: 400;\">reverberation time<\/span><\/i><span style=\"font-weight: 400;\"> or <\/span><i><span style=\"font-weight: 400;\">RT60<\/span><\/i><span style=\"font-weight: 400;\">. When an active sound source in a room is abruptly stopped, it will take longer or shorter time for the sound intensity to drop to a certain level, depending on the room\u2019s geometry, materials, and other factors. In the case of RT60, the question is, how long it takes for the sound energy density to decrease by 60 decibels (dB), that is, to the millionth of its initial value. As noted by Schroeder (see the references), the average signal energy at time <\/span><i><span style=\"font-weight: 400;\">t<\/span><\/i><span style=\"font-weight: 400;\"> used for computing reverberation time is proportional to the tail energy of the RIR, that is the total energy after time <\/span><i><span style=\"font-weight: 400;\">t<\/span><\/i><span style=\"font-weight: 400;\">. Thus, we can compute RT60 by plotting the tail energy level of the RIR on a dB scale (with respect to the total energy). For example, the plot corresponding to the RIR above is shown in Fig. 6:<\/span><\/p>\n<div id=\"attachment_10011\" style=\"width: 651px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10011\" loading=\"lazy\" class=\"size-full wp-image-10011\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-6.png\" alt=\"The RIR tail energy level curve\" width=\"641\" height=\"437\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-6.png 641w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-6-300x205.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-6-380x259.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-6-600x409.png 600w\" sizes=\"(max-width: 641px) 100vw, 641px\" \/><\/p>\n<p id=\"caption-attachment-10011\" class=\"wp-caption-text\">Fig. 6 The RIR tail energy level curve.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">In theory, the RIR tail energy decay should be exponential, that is, linear on a dB scale, but, as can be seen here, it drops irregularly starting at -25 dB. This is due to RIR measurement limitations. In such cases, one restricts the attention to the linear part, normally between the values -5 dB and -25 dB, and obtains RT60 by fitting a line to the measurements of RIR in logarithmic scale, by linear regression, for example.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2><strong>RIR Simulation<\/strong><\/h2>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">As mentioned in the introduction, one often needs to compute a RIR for a room with given dimensions and material specifications without physically building the room. One way of achieving this would be by actually building a scaled model of the room. Then we could measure the RIR by using test signals with accordingly scaled frequencies, and rescale the recorded RIR frequencies. A more flexible and cheaper way is through computer simulations, by building a digital model of the room and modeling sound propagation. Sound propagation in a room (or other media) is described with differential equations called <\/span><i><span style=\"font-weight: 400;\">wave equations<\/span><\/i><span style=\"font-weight: 400;\">. However, the exact solution of these equations is out of reach in most practical settings, and one has to resort to approximate methods for simulations.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">While there are many approaches for modeling sound propagation, most common simulation algorithms are based on either <\/span><i><span style=\"font-weight: 400;\">geometrical<\/span><\/i><span style=\"font-weight: 400;\"> simplification of sound propagation or <\/span><i><span style=\"font-weight: 400;\">element-based<\/span><\/i><span style=\"font-weight: 400;\"> methods. Element-based methods, such as the Finite Element method, rely on numerical solution of wave equations over a <\/span><i><span style=\"font-weight: 400;\">discretized<\/span><\/i><span style=\"font-weight: 400;\"> space. For this purpose, the room space is approximated with a discrete grid or a mesh of small volume elements. Accordingly, functions describing the sound field (such as the sound pressure or density) are defined down to the level of a single volume element. The advantage of these methods is that they are more faithful to the wave equations and hence more accurate. However the computational complexity of element-based methods grows rapidly with frequency, as higher frequencies require higher resolution of a mesh (smaller volume element size). For this reason, for wideband applications like speech, element-based methods are often used to model sound propagation only for low frequencies, say, up to 1 kHz.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Geometric methods, on the other hand, work in the time domain. They model sound propagation in terms of <\/span><i><span style=\"font-weight: 400;\">sound<\/span><\/i> <i><span style=\"font-weight: 400;\">rays<\/span><\/i><span style=\"font-weight: 400;\"> or <\/span><i><span style=\"font-weight: 400;\">particles <\/span><\/i><span style=\"font-weight: 400;\">with intensity decreasing with the squared path length from the source. As such, wave-specific interference between rays is abstracted away. Thus rays effectively become sound energy carriers, with\u00a0 the sound energy at a point being computed by the sum of the energies of rays passing through that point. Geometric methods give plausible results for not-too-low frequencies, e.g., above the Schroeder frequency. Below that, wave effects are more prominent (recall the remarks on room modes above), and geometric methods may be inaccurate.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The room geometry is usually modeled with polygons. Walls and other surfaces are assigned <\/span><i><span style=\"font-weight: 400;\">absorption<\/span><\/i><span style=\"font-weight: 400;\"> coefficients that describe the fraction of incident sound energy that is reflected back into the room by the surface (the rest is \u201clost\u201d from the simulation perspective). One may also need to model air absorption and sound <\/span><i><span style=\"font-weight: 400;\">scattering<\/span><\/i><span style=\"font-weight: 400;\"> by rough materials with not too small features as compared to the sound wavelengths.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Two well-known geometric methods are <\/span><i><span style=\"font-weight: 400;\">stochastic Ray Tracing<\/span><\/i><span style=\"font-weight: 400;\"> and <\/span><i><span style=\"font-weight: 400;\">Image Source<\/span><\/i><span style=\"font-weight: 400;\"> methods. In Ray Tracing, a sound source emits a (large) number of sound rays in random directions, also taking into account directivity of the source. Each ray has some starting energy. It travels with the speed of sound and reflects from the walls while losing energy with each reflection, according to the absorption coefficients of walls, as well as due to air absorption and other losses.<\/span><\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_10012\" style=\"width: 472px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10012\" loading=\"lazy\" class=\"wp-image-10012\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-7.png\" alt=\"Ray Tracing (only wall absorption shown)\" width=\"462\" height=\"400\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-7.png 1326w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-7-300x260.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-7-380x329.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-7-768x665.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-7-600x519.png 600w\" sizes=\"(max-width: 462px) 100vw, 462px\" \/><\/p>\n<p id=\"caption-attachment-10012\" class=\"wp-caption-text\">Fig. 7 Ray Tracing (only wall absorption shown).<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">The reflections are either specular (incident and reflected angles are equal) or scattering happens, the latter usually being modeled by a random reflection direction. The receiver registers the remaining energy, time and angle of arrival of each ray that hits its surface. Time is tracked in discrete intervals. Thus, one gets an <\/span><i><span style=\"font-weight: 400;\">energy histogram<\/span><\/i><span style=\"font-weight: 400;\"> corresponding to the RIR with a bucket for each time interval. In order to synthesize the temporal structure of the RIR, a random Poisson-distributed sequence of signed unit impulses can be generated, which is then scaled according to the energy histogram obtained from simulation to give a RIR. For psychoacoustic reasons, one may want to treat different frequency bands separately. In this case, the procedure of scaling the random impulse sequence is done for band-passed versions of the sequence, then their sum is taken as the final RIR.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The Image Source method models only specular reflections (no scattering). In this case, a reflected ray from a source towards a receiver can be replaced with rays coming from \u201cmirror images\u201d of the source with respect to the reflecting wall, as shown in Fig. 8.<\/span><\/p>\n<div id=\"attachment_10013\" style=\"width: 489px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-10013\" loading=\"lazy\" class=\"wp-image-10013\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-8.png\" alt=\"The Image Source method\" width=\"479\" height=\"400\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-8.png 1002w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-8-300x250.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-8-380x317.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-8-768x641.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/RIR-Figure-8-600x501.png 600w\" sizes=\"(max-width: 479px) 100vw, 479px\" \/><\/p>\n<p id=\"caption-attachment-10013\" class=\"wp-caption-text\">Fig. 8 The Image Source method.<\/p>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">This way, instead of keeping track of reflections, we construct images of the source relative to each wall and consider straight rays from all sources (including the original one) to the receiver. These <\/span><i><span style=\"font-weight: 400;\">first order<\/span><\/i><span style=\"font-weight: 400;\"> images cover single reflections. For rays that reach the receiver after two reflections, we construct the images of the first order images, call them <\/span><i><span style=\"font-weight: 400;\">second order<\/span><\/i><span style=\"font-weight: 400;\"> images, and so on, recursively. For each reflection, we can also incorporate material absorption losses, as well as air absorption. The final RIR is constructed by considering each ray as an impulse that undergoes scaling due to absorption and distance-based energy losses, as well as a distance-based phase shift (delay) for each frequency component. Before that, we need to filter out invalid image sources for which the image-receiver path does not intersect the image reflection wall or is blocked by other walls.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">While the Image Source method captures specular reflections, it does not model scattering that is an important aspect of the late reverberant part of a RIR. It does not model wave-based effects either. More generally, each existing method has its advantages and shortcomings. Fortunately, shortcomings of different approaches are often complementary, so it makes sense to use <\/span><i><span style=\"font-weight: 400;\">hybrid models<\/span><\/i><span style=\"font-weight: 400;\"> that combine several of the methods described above. For modeling late reverberations, stochastic methods like Ray Tracing are more suitable, while they may be too imprecise for modeling the early specular reflections in a RIR. One could further rely on element-based methods like the Finite Element method for modeling RIR below the Schroeder frequency where wave-based effects are more prominent.<\/span><\/p>\n<h2><strong>Summary<\/strong><\/h2>\n<p><span style=\"font-weight: 400;\">Room impulse response (RIR) plays a key role in modeling acoustic environments. Thus, when developing voice-related algorithms, be it for voice enhancement, automatic speech recognition, or something else, here at Krisp we need to keep in mind that these algorithms must be robust to changes in acoustics settings. This is usually achieved by incorporating the acoustic properties of various room environments, as was briefly discussed here, into the design of the algorithms. This provides our users with a seamless experience, largely independent of the room from which Krisp is being used: they don\u2019t hear the room.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Try next-level audio and voice technologies \u00a0<\/strong><\/h2>\n<p><a href=\"https:\/\/krisp.ai\/blog\/voice-communication-quality-with-krisp-sdk\/\" target=\"_blank\" rel=\"noopener\">Krisp licenses its SDKs<\/a>\u00a0to embed directly into applications and devices. <a href=\"https:\/\/krisp.ai\/developers\/\" target=\"_blank\" rel=\"noopener\">Learn more about Krisp&#8217;s SDKs<\/a> and begin your evaluation today.<\/p>\n<p><a href=\"https:\/\/krisp.ai\/developers\/\"><img loading=\"lazy\" class=\"aligncenter wp-image-9898 size-full\" src=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/03\/engineering-blog-cta.png\" alt=\"Krisp Developers page banner\" width=\"1280\" height=\"720\" srcset=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/03\/engineering-blog-cta.png 1280w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/03\/engineering-blog-cta-300x169.png 300w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/03\/engineering-blog-cta-380x214.png 380w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/03\/engineering-blog-cta-768x432.png 768w, https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/03\/engineering-blog-cta-600x338.png 600w\" sizes=\"(max-width: 1280px) 100vw, 1280px\" \/><\/a><\/p>\n<h2>References<\/h2>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Overview of room acoustics techniques] <\/span><a href=\"https:\/\/link.springer.com\/book\/10.1007\/978-3-540-48830-9\"><span style=\"font-weight: 400;\">M. Vorl\u00e4nder, Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality. Springer, 2008.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Overview of room acoustics techniques] <\/span><a href=\"https:\/\/www.taylorfrancis.com\/books\/mono\/10.1201\/9781482266450\/room-acoustics-heinrich-kuttruff\"><span style=\"font-weight: 400;\">H. Kuttruff, Room Acoustics (5th ed.). CRC Press, 2009.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Signals and systems, including some Fourier analysis] <\/span><a href=\"https:\/\/link.springer.com\/book\/10.1007\/978-3-319-68675-2\"><span style=\"font-weight: 400;\">K. Deergha Rao, Signals and Systems. Birkh\u00e4user Cham, 2018.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Exposition of simulation methods] <\/span><a href=\"http:\/\/publications.rwth-aachen.de\/record\/50580\/files\/3875.pdf\"><span style=\"font-weight: 400;\">D. Schr\u00f6der, Physically Based Real-Time Auralization of Interactive Virtual Environments. PhD thesis, RWTH Aachen, 2011.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Maximum Length Sequences for RIR measurement] <\/span><a href=\"https:\/\/pubs.aip.org\/asa\/jasa\/article\/66\/2\/497\/647803\/Integrated-impulse-method-measuring-sound-decay\"><span style=\"font-weight: 400;\">M. R. Schroeder, \u201cIntegrated-impulse Method for Measuring Sound Decay without using Impulses\u201d. The Journal of the Acoustical Society of America, vol. 66, pp. 497\u2013500, 1979.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Stretched impulse method for RIR measurement] <\/span><a href=\"https:\/\/pubs.aip.org\/asa\/jasa\/article\/69\/5\/1484\/774064\/Computer-generated-pulse-signal-applied-for-sound\"><span style=\"font-weight: 400;\">N. Aoshima, \u201cComputer-generated Pulse Signal applied for Sound Measurement\u201d. The Journal of the Acoustical Society of America, vol. 69, no. 5, pp. 1484\u20131488, 1981.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Exponential Sine Sweep technique for RIR measurement] A. Farina, \u201cSimultaneous Measurement of Impulse Response and Distortion with a Swept-sine Technique\u201d. In Audio Engineering Society Convention 108, 2000.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Comparison of RIR measurement techniques] <\/span><a href=\"https:\/\/www.aes.org\/e-lib\/browse.cfm?elib=11083\"><span style=\"font-weight: 400;\">G. B. Stan, J. J. Embrechts, and D. Archambeau, \u201cComparison of different Impulse Response Measurement Techniques\u201d. Journal of the Audio Engineering Society, vol. 50, pp. 249\u2013262, 2002.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">[Schroeder Integration for RT60 calculation] <\/span><a href=\"https:\/\/pubs.aip.org\/asa\/jasa\/article\/37\/3\/409\/720995\/New-Method-of-Measuring-Reverberation-Time\"><span style=\"font-weight: 400;\">M. R. Schroeder; New Method of Measuring Reverberation Time. The Journal of the Acoustical Society of America, vol. 37, no. 3, pp. 409\u2013412, 1965.<\/span><\/a><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><a href=\"https:\/\/github.com\/RoyJames\/room-impulse-responses\"><span style=\"font-weight: 400;\">Room impulse response (RIR) datasets<\/span><\/a><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<hr \/>\n<h4><b>The article is written by:<\/b><\/h4>\n<ul>\n<li><span style=\"font-weight: 400;\">Tigran Tonoyan, PhD in Computer Science, Senior ML Engineer II<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Hayk Aleksanyan, PhD in Mathematics, Architect, Tech Lead<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Aris Hovsepyan, MS in Computer Science, Senior ML Engineer I<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Sound propagation has distinctive features associated with the environment where it happens. Human ears can often clearly distinguish whether a given sound recording was produced in a small room, large room, or outdoors. One can even get a sense of a direction or a distance from the sound source by listening to a recording. [&hellip;]<\/p>\n","protected":false},"author":65,"featured_media":10014,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"two_page_speed":[]},"categories":[421],"tags":[447,449,448,444,446,445],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v24.2 (Yoast SEO v23.6) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Can You Hear a Room? - Krisp<\/title>\n<meta name=\"description\" content=\"Discover the characteristics of any space with room impulse response, essential for audio professionals to analyze sound quality in different environments.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Can You Hear a Room? - Krisp\" \/>\n<meta property=\"og:description\" content=\"In the field of digital signal processing (DSP), one often needs to model the propagation of sound in a room with specific features. The room impulse response (RIR) serves as an encoding that captures the room&#039;s acoustics to a significant degree.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/\" \/>\n<meta property=\"og:site_name\" content=\"Krisp\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/krispHQ\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-07-06T06:24:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-18T10:36:16+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"700\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Krisp Research Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:description\" content=\"In the field of digital signal processing (DSP), one often needs to model the propagation of sound in a room with specific features. The room impulse response (RIR) serves as an encoding that captures the room&#039;s acoustics to a significant degree.\" \/>\n<meta name=\"twitter:creator\" content=\"@krispHQ\" \/>\n<meta name=\"twitter:site\" content=\"@krispHQ\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/\"},\"author\":{\"name\":\"Krisp Research Team\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1\"},\"headline\":\"Can You Hear a Room?\",\"datePublished\":\"2023-07-06T06:24:19+00:00\",\"dateModified\":\"2024-03-18T10:36:16+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/\"},\"wordCount\":3568,\"commentCount\":6,\"publisher\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png\",\"keywords\":[\"audio sdk\",\"echo cancellation\",\"krisp sdk\",\"RIR\",\"room echo\",\"Room Impulse Response\"],\"articleSection\":[\"Engineering Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/\",\"url\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/\",\"name\":\"Can You Hear a Room? - Krisp\",\"isPartOf\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png\",\"datePublished\":\"2023-07-06T06:24:19+00:00\",\"dateModified\":\"2024-03-18T10:36:16+00:00\",\"description\":\"Discover the characteristics of any space with room impulse response, essential for audio professionals to analyze sound quality in different environments.\",\"breadcrumb\":{\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage\",\"url\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png\",\"contentUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png\",\"width\":1000,\"height\":700,\"caption\":\"The way sound propagates is determined to a large degree by the ambient space in which it occurs. In digital signal processing (DSP), one often needs to model the sound field in a room with specific characteristics. The room impulse response (RIR) provides a means to capture room acoustics to a considerable extent.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/krisp.ai\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Can You Hear a Room?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/krisp.ai\/blog\/#website\",\"url\":\"https:\/\/krisp.ai\/blog\/\",\"name\":\"Krisp\",\"description\":\"Blog\",\"publisher\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/krisp.ai\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/krisp.ai\/blog\/#organization\",\"name\":\"Krisp\",\"url\":\"https:\/\/krisp.ai\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png\",\"contentUrl\":\"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png\",\"width\":696,\"height\":696,\"caption\":\"Krisp\"},\"image\":{\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/krispHQ\/\",\"https:\/\/x.com\/krispHQ\",\"https:\/\/www.linkedin.com\/company\/krisphq\/\",\"https:\/\/www.youtube.com\/channel\/UCAMZinJdR9P33fZUNpuxXtg\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1\",\"name\":\"Krisp Research Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/krisp.ai\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g\",\"caption\":\"Krisp Research Team\"},\"url\":\"https:\/\/krisp.ai\/blog\/author\/research-team\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Can You Hear a Room? - Krisp","description":"Discover the characteristics of any space with room impulse response, essential for audio professionals to analyze sound quality in different environments.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/","og_locale":"en_US","og_type":"article","og_title":"Can You Hear a Room? - Krisp","og_description":"In the field of digital signal processing (DSP), one often needs to model the propagation of sound in a room with specific features. The room impulse response (RIR) serves as an encoding that captures the room's acoustics to a significant degree.","og_url":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/","og_site_name":"Krisp","article_publisher":"https:\/\/www.facebook.com\/krispHQ\/","article_published_time":"2023-07-06T06:24:19+00:00","article_modified_time":"2024-03-18T10:36:16+00:00","og_image":[{"width":1000,"height":700,"url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png","type":"image\/png"}],"author":"Krisp Research Team","twitter_card":"summary_large_image","twitter_description":"In the field of digital signal processing (DSP), one often needs to model the propagation of sound in a room with specific features. The room impulse response (RIR) serves as an encoding that captures the room's acoustics to a significant degree.","twitter_creator":"@krispHQ","twitter_site":"@krispHQ","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#article","isPartOf":{"@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/"},"author":{"name":"Krisp Research Team","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1"},"headline":"Can You Hear a Room?","datePublished":"2023-07-06T06:24:19+00:00","dateModified":"2024-03-18T10:36:16+00:00","mainEntityOfPage":{"@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/"},"wordCount":3568,"commentCount":6,"publisher":{"@id":"https:\/\/krisp.ai\/blog\/#organization"},"image":{"@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage"},"thumbnailUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png","keywords":["audio sdk","echo cancellation","krisp sdk","RIR","room echo","Room Impulse Response"],"articleSection":["Engineering Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/","url":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/","name":"Can You Hear a Room? - Krisp","isPartOf":{"@id":"https:\/\/krisp.ai\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage"},"image":{"@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage"},"thumbnailUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png","datePublished":"2023-07-06T06:24:19+00:00","dateModified":"2024-03-18T10:36:16+00:00","description":"Discover the characteristics of any space with room impulse response, essential for audio professionals to analyze sound quality in different environments.","breadcrumb":{"@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#primaryimage","url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png","contentUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2023\/07\/isometric-noise-wave1.png","width":1000,"height":700,"caption":"The way sound propagates is determined to a large degree by the ambient space in which it occurs. In digital signal processing (DSP), one often needs to model the sound field in a room with specific characteristics. The room impulse response (RIR) provides a means to capture room acoustics to a considerable extent."},{"@type":"BreadcrumbList","@id":"https:\/\/krisp.ai\/blog\/can-you-hear-a-room\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/krisp.ai\/blog\/"},{"@type":"ListItem","position":2,"name":"Can You Hear a Room?"}]},{"@type":"WebSite","@id":"https:\/\/krisp.ai\/blog\/#website","url":"https:\/\/krisp.ai\/blog\/","name":"Krisp","description":"Blog","publisher":{"@id":"https:\/\/krisp.ai\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/krisp.ai\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/krisp.ai\/blog\/#organization","name":"Krisp","url":"https:\/\/krisp.ai\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png","contentUrl":"https:\/\/krisp.ai\/blog\/wp-content\/uploads\/2024\/10\/K.png","width":696,"height":696,"caption":"Krisp"},"image":{"@id":"https:\/\/krisp.ai\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/krispHQ\/","https:\/\/x.com\/krispHQ","https:\/\/www.linkedin.com\/company\/krisphq\/","https:\/\/www.youtube.com\/channel\/UCAMZinJdR9P33fZUNpuxXtg"]},{"@type":"Person","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/172d23b73915155e0ab4e97868216bd1","name":"Krisp Research Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/krisp.ai\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/49fc839d54b3ccba70e28ccaad1472a7?s=96&d=mm&r=g","caption":"Krisp Research Team"},"url":"https:\/\/krisp.ai\/blog\/author\/research-team\/"}]}},"_links":{"self":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/10005"}],"collection":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/users\/65"}],"replies":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/comments?post=10005"}],"version-history":[{"count":12,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/10005\/revisions"}],"predecessor-version":[{"id":10137,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/posts\/10005\/revisions\/10137"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/media\/10014"}],"wp:attachment":[{"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/media?parent=10005"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/categories?post=10005"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/krisp.ai\/blog\/wp-json\/wp\/v2\/tags?post=10005"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}