During training our specially designed Deep Neural Network (krispNet) is fed with very large amount of distinct background noises and clean human voices. It optimizes itself to recognize what's background noise and separate it from human speech by leaving only the latter. During inference krispNet acts on real time audio and removes background noise.
The same krispNet DNN, trained on hundreds of hours of customized data, is able to perform Packet Loss Concealment (predicting lost network packets) for audio and fill out missing voice chunks by eliminating "chopping" in voice calls. Voice Audio produced by krispNet sounds much more natural for human ear than the robotic sounds left by other similar algorithms.
The same krispNet DNN, trained on hundreds of hours of customized data, is able to predict higher frequencies of human voice and produce much richer voice audio than the original lower bitrate audio. krispNet-HD takes 8kHz sampled audio as input and returns 16kHz samples audio which sounds as if it were originally sampled with 16kHz.