SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing singing quality evaluation methods suffer from high subjectivity costs and insufficient perceptual coverage of objective metrics. To address this, we introduce SingEval—the first large-scale, multi-dimensional singing quality evaluation benchmark—comprising 7,981 singing segments generated by 41 models across 12 source datasets, with fine-grained Mean Opinion Score (MOS) annotations for lyric accuracy, pitch consistency, and overall quality. Leveraging professional human ratings and systematic benchmarking, we conduct the first comprehensive assessment of mainstream objective metrics—including PEAQ, CREPE, and DeepMOS—in the singing domain. SingEval significantly enhances the completeness and practicality of evaluation dimensions, providing a reproducible, open-source, state-of-the-art baseline. This work establishes a robust foundation for future research on singing synthesis quality evaluation.

Technology Category

Application Category

📝 Abstract
Singing voice generation progresses rapidly, yet evaluating singing quality remains a critical challenge. Human subjective assessment, typically in the form of listening tests, is costly and time consuming, while existing objective metrics capture only limited perceptual aspects. In this work, we introduce SingMOS-Pro, a dataset for automatic singing quality assessment. Building on our preview version SingMOS, which provides only overall ratings, SingMOS-Pro expands annotations of the additional part to include lyrics, melody, and overall quality, offering broader coverage and greater diversity. The dataset contains 7,981 singing clips generated by 41 models across 12 datasets, spanning from early systems to recent advances. Each clip receives at least five ratings from professional annotators, ensuring reliability and consistency. Furthermore, we explore how to effectively utilize MOS data annotated under different standards and benchmark several widely used evaluation methods from related tasks on SingMOS-Pro, establishing strong baselines and practical references for future research. The dataset can be accessed at https://huggingface.co/datasets/TangRain/SingMOS-Pro.
Problem

Research questions and friction points this paper is trying to address.

Evaluating singing quality remains a critical challenge
Existing objective metrics capture limited perceptual aspects
Human subjective assessment is costly and time-consuming
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expanded dataset with lyrics and melody annotations
Professional multi-rater evaluation for reliability assurance
Benchmarked evaluation methods for strong baselines
🔎 Similar Papers
No similar papers found.
Y
Yuxun Tang
Renmin University of China
L
Lan Liu
Sun Yat-sen University
Wenhao Feng
Wenhao Feng
State Key Laboratory of Robotics and System, Harbin Institute of Technology
RoboticsSpace roboticsArtificial Intelligence
Y
Yiwen Zhao
Carnegie Mellon University
J
Jionghao Han
Carnegie Mellon University
Yifeng Yu
Yifeng Yu
Tsinghua University
Samplingdiffusion model
J
Jiatong Shi
Carnegie Mellon University
Qin Jin
Qin Jin
中国人民大学信息学院
人工智能