Following the successful rollout of Accent Conversion v3.7 for Indian and Filipino accent packs, we are excited to announce that the Latin American (LatAm) accent pack has now been upgraded to v3.7.
This update introduces enhancements to speaker similarity, ensuring overall voice stability. As a result, the converted speech sounds closer to the original voice preserves the unique qualities of the original voice while remaining clear, stable, and easy to understand.
Key Improvements in LatAm v3.7
- Speaker Similarity: Noticeably stronger preservation of the original speaker’s voice. Objective evaluations showed a 10% improvement in similarity compared to v3.5.
- Voice Stability: More consistent pitch and tone throughout speech, eliminating artificial fluctuations and producing a smoother, more natural output.
- Naturalness: With enhanced similarity and stability, converted speech is perceived as more human-like and fluid. Crowdsourced model comparisons demonstrated a 9% increase in naturalness scores for v3.7.
Evaluation Results
Our evaluation combined both objective metrics and subjective, crowdsourced testing to ensure robust validation:
- 37 real-world recordings were sampled for evaluation.
- For the crowdsourced study, each recording received 40 independent votes, yielding a total of 1,480 votes and ensuring statistical confidence in the results.
- The reported results represent aggregated averages across all recordings.
These findings consistently confirm noticable quality improvements delivered by v3.7.
Metric | LatAm AC 3.5 | LatAm AC 3.7 | Comment |
---|---|---|---|
Speaker Similarity (0 to 1) | 0.7 | 0.77 (+10%) | Objective metric computing similarity between two voices. The higher, the better. 37 real-world audio recordings assessed by 30 participants |
Crowdsourced Evaluation – “How natural does the voice sound?” (1 to 5) | 3.35 | 3.44 (+9%) | 37 real-world audio recordings assessed by 30 participants |
Comparative audio samples
Listening Tip: For the most accurate and immersive comparison between Accent Conversion v3.5 and v3.7, we recommend using quality headphones.
This highlights improvements in clarity, naturalness, and speaker identity preservation that may not be as noticeable on laptop or mobile speakers.
# | Improvement Category | Original | Converted AC v3.5 | Converted AC v3.7 |
---|---|---|---|---|
1 | Speaker Similarity, Voice Stability | |||
2 | Speaker Similarity, Speech Naturalness | |||
3 | Speaker Similarity | |||
4 | Speaker Similarity, Speech Naturalness | |||
5 | Speaker Similarity | |||
6 | Speaker Similarity |