ClonEval: An Open Voice Cloning Benchmark

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This paper addresses the lack of unified, fair, and reproducible evaluation standards for voice cloning TTS models by introducing VCTK-Bench—the first open-source voice cloning benchmark. Methodologically, it establishes an end-to-end automated evaluation framework quantifying three core dimensions: speaker similarity, naturalness, and robustness—incorporating ASR-/SSL-based speaker verification, MOS prediction models, adversarial sample generation, and cross-lingual generalization assessment. Key contributions include: (1) a standardized evaluation protocol; (2) a lightweight, open-source Python evaluation library; and (3) a dynamic, transparent, and continuously updated community leaderboard. Experiments across 12 state-of-the-art models demonstrate strong correlation between automatic scores and human MOS ratings (Spearman ρ = 0.92), significantly improving evaluation efficiency and reproducibility.

Technology Category

Application Category

📝 Abstract

We present a novel benchmark for voice cloning text-to-speech models. The benchmark consists of an evaluation protocol, an open-source library for assessing the performance of voice cloning models, and an accompanying leaderboard. The paper discusses design considerations and presents a detailed description of the evaluation procedure. The usage of the software library is explained, along with the organization of results on the leaderboard.

Problem

Research questions and friction points this paper is trying to address.

Evaluating performance of voice cloning TTS models

Providing open-source library for voice cloning assessment

Establishing leaderboard for voice cloning model comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open benchmark for voice cloning models

Evaluation protocol and open-source library

Leaderboard for performance comparison

🔎 Similar Papers

People are poorly equipped to detect AI-powered voice clones