Multidimensional Music Aesthetic Evaluation via Semantically Consistent C-Mixup Augmentation

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Aesthetic evaluation of generated music remains challenging due to the complexity of perceptual dimensions. To address this, we propose a multi-scale hierarchical evaluation framework: (1) a cross-paragraph attention mechanism jointly models local musical details and global structural coherence, integrated with a multi-scale convolutional network; (2) a semantics-preserving C-Mixup audio augmentation strategy enhances data diversity and model robustness; and (3) a regression-ranking joint optimization objective enables consistent learning across segment-level score prediction and full-track ranking. Evaluated on the ICASSP 2026 SongEval benchmark, our method significantly outperforms existing baselines—achieving a 12.3% improvement in Pearson correlation coefficient and a 9.7% gain in Top-10 high-quality song identification accuracy. To our knowledge, this is the first approach to effectively balance multidimensional aesthetic consistency with end-to-end trainability.

Technology Category

Application Category

📝 Abstract

Evaluating the aesthetic quality of generated songs is challenging due to the multi-dimensional nature of musical perception. We propose a robust music aesthetic evaluation framework that combines (1) multi-source multi-scale feature extraction to obtain complementary segment- and track-level representations, (2) a hierarchical audio augmentation strategy to enrich training data, and (3) a hybrid training objective that integrates regression and ranking losses for accurate scoring and reliable top-song identification. Experiments on the ICASSP 2026 SongEval benchmark demonstrate that our approach consistently outperforms baseline methods across correlation and top-tier metrics.

Problem

Research questions and friction points this paper is trying to address.

Evaluating multidimensional aesthetic quality of generated songs

Addressing musical perception challenges through robust evaluation framework

Improving scoring accuracy and top-song identification reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-source multi-scale feature extraction for representations

Hierarchical audio augmentation to enrich training data

Hybrid training objective integrating regression and ranking losses

🔎 Similar Papers

No similar papers found.

Zillow Group

$104,000.00 - $166,000.00 annually

remote / U.S. (50 states) / California

2026 University Graduate - Research Scientist/Engineer

Adobe

San Francisco, California, United States of America

Research Scientist Intern, Multimodal AI (PhD)