krisp

In August, we introduced Accent Conversion v3.7 with major improvements in naturalness, fluency, and voice stability, starting with the Indian accent pack. That release marked a turning point—showing how much closer we can get to native-like, stable, and intelligible speech for global contact centers.

Today, we’re excited to announce that the Filipino accent pack is now upgraded to v3.7.

Key Improvements in Filipino v3.7

Through rigorous testing and customer feedback, v3.7 shows clear gains over v3.5 across all major dimensions:

  • Naturalness: Converted speech is significantly more human-like and conversational. Crowdsourced model comparisons demonstrated a 32% stronger preference for v3.7 over the previous version.
  • Pronunciation Accuracy: Enhanced phoneme pronunciation and intelligibility, with a ~9.31% relative improvement in Phoneme Error Rate (PER) on customer datasets. This improvement is largely driven by incorporating more conversational data during training. Accent-specific gains include more native-like articulation of consonants such as “t,” “p,” and “r.”
  • Voice Stability: Greater consistency in pitch and tone throughout speech, reducing unnatural fluctuations. This contributes directly to more natural and stable-sounding output.
  • Speech & Audio Clarity: Clearer audio with fewer artifacts and distortions, particularly in cases of slurred or mumbled speech. Crowdsourced model comparisons showed a 37% stronger preference for v3.7 in terms of overall clarity and intelligibility.

Evaluation Results

For subjective and objective evaluations, 57 real-world recordings were sampled.

For the crowdsourced evaluation, each recording received exactly 40 independent votes to ensure statistical confidence, 2280 total votes.

The results shown in the table below represent aggregated averages across all recordings.

Metric Filipino AC V3.5 Filipino AC V3.7 Comment
Crowdsourced Evaluation – “How natural does the voice sound?” (1 to 5) 3.56 3.71 (+4%) 57 real-world audio recordings assessed by 30 participants
Crowdsourced Models’ Comparison – Which option sounds more natural? 982 1298 (+32%) 57 real-world audio recording pairs were evaluated, with each pair assessed by 40 participants, in total 2280 voting
Crowdsourced Models’ Comparison – Which speech sounds more clear and intelligible? 961 1319 (+37%) 57 real-world audio recording pairs were evaluated, with each pair assessed by 40 participants, in total 2280 voting

 

Comparative audio samples

Listening Tip: For the most accurate and immersive comparison between v3.5 and v3.7 Accent Conversion, we recommend using quality headphones.

This helps highlight the improvements in clarity, naturalness, and speaker identity preservation that may be less perceptible on laptop or mobile speakers.

 

# Improvement Category Original Converted AC v3.5 Converted AC v3.7
1 Speech Naturalness, Speech Clarity
2 Speech Naturalness, Less Accent Leakage
3 Speech Naturalness Speech Clarity
4 Speech Clarity, Better phonemes (we, for)
5 Speech Naturalness Voice Stability
6 Speech Clarity Speech Naturalness
7 Speech Naturalness Speech Clarity
8 Speech Clarity, Better Phonemes (check support)
9 Speech Naturalness Less Accent Leackage
10 Speech Clarity, Better Phonemes (questions)

Related Articles