Music Arena: Live Evaluation for Text-to-Music

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-music (TTM) models lack an open, scalable human preference evaluation platform, resulting in high-cost, non-comparable preference studies and no publicly available benchmark data to support alignment optimization or automated evaluation. Method: We introduce the first open-source, real-time TTM evaluation platform featuring: (i) an LLM-driven heterogeneous model routing system enabling dynamic multi-model integration; (ii) a standardized listening evaluation protocol for collecting fine-grained preference data—including listening behaviors and natural language feedback; and (iii) a rolling data release policy with privacy-preserving mechanisms. Contribution/Results: The platform hosts a public leaderboard and continuously delivers high-quality, reproducible human preference datasets. It significantly enhances transparency, cross-model comparability, and iterative efficiency in TTM evaluation—establishing a foundational infrastructure for preference-driven TTM research and development.

Technology Category

Application Category

📝 Abstract
We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare, as study protocols may differ across systems. Moreover, human preferences might help researchers align their TTM systems or improve automatic evaluation metrics, but an open and renewable source of preferences does not currently exist. We aim to fill these gaps by offering *live* evaluation for TTM. In Music Arena, real-world users input text prompts of their choosing and compare outputs from two TTM systems, and their preferences are used to compile a leaderboard. While Music Arena follows recent evaluation trends in other AI domains, we also design it with key features tailored to music: an LLM-based routing system to navigate the heterogeneous type signatures of TTM systems, and the collection of *detailed* preferences including listening data and natural language feedback. We also propose a rolling data release policy with user privacy guarantees, providing a renewable source of preference data and increasing platform transparency. Through its standardized evaluation protocol, transparent data access policies, and music-specific features, Music Arena not only addresses key challenges in the TTM ecosystem but also demonstrates how live evaluation can be thoughtfully adapted to unique characteristics of specific AI domains. Music Arena is available at: https://music-arena.org
Problem

Research questions and friction points this paper is trying to address.

Scalable human preference evaluation for text-to-music models
Standardized comparison of text-to-music systems via live user feedback
Open platform for renewable preference data and music-specific evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open platform for scalable human preference evaluation
LLM-based routing system for TTM systems
Rolling data release with privacy guarantees
🔎 Similar Papers
No similar papers found.