Multidimensional Music Aesthetic Evaluation via Semantically Consistent C-Mixup Augmentation

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Aesthetic evaluation of generated music remains challenging due to the complexity of perceptual dimensions. To address this, we propose a multi-scale hierarchical evaluation framework: (1) a cross-paragraph attention mechanism jointly models local musical details and global structural coherence, integrated with a multi-scale convolutional network; (2) a semantics-preserving C-Mixup audio augmentation strategy enhances data diversity and model robustness; and (3) a regression-ranking joint optimization objective enables consistent learning across segment-level score prediction and full-track ranking. Evaluated on the ICASSP 2026 SongEval benchmark, our method significantly outperforms existing baselines—achieving a 12.3% improvement in Pearson correlation coefficient and a 9.7% gain in Top-10 high-quality song identification accuracy. To our knowledge, this is the first approach to effectively balance multidimensional aesthetic consistency with end-to-end trainability.

Technology Category

Application Category

📝 Abstract
Evaluating the aesthetic quality of generated songs is challenging due to the multi-dimensional nature of musical perception. We propose a robust music aesthetic evaluation framework that combines (1) multi-source multi-scale feature extraction to obtain complementary segment- and track-level representations, (2) a hierarchical audio augmentation strategy to enrich training data, and (3) a hybrid training objective that integrates regression and ranking losses for accurate scoring and reliable top-song identification. Experiments on the ICASSP 2026 SongEval benchmark demonstrate that our approach consistently outperforms baseline methods across correlation and top-tier metrics.
Problem

Research questions and friction points this paper is trying to address.

Evaluating multidimensional aesthetic quality of generated songs
Addressing musical perception challenges through robust evaluation framework
Improving scoring accuracy and top-song identification reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-source multi-scale feature extraction for representations
Hierarchical audio augmentation to enrich training data
Hybrid training objective integrating regression and ranking losses
🔎 Similar Papers
No similar papers found.
Shuyang Liu
Shuyang Liu
University of Illinois Urbana-Champaign
Machine LearningProgram Analysis
Yuan Jin
Yuan Jin
Apple
Quantum Cascade LasersSemiconductor PhysicsIntegrated Photonics
R
Rui Lin
ϵar-LAB, ZiYouLiangJi(Shanghai) Information Technology Co., Ltd
Shizhe Chen
Shizhe Chen
INRIA Paris
Computer VisionVision-and-Language
J
Junyu Dai
ϵar-LAB, ZiYouLiangJi(Shanghai) Information Technology Co., Ltd
T
Tao Jiang
ϵar-LAB, ZiYouLiangJi(Shanghai) Information Technology Co., Ltd