🤖 AI Summary
This work addresses the longstanding limitation in text-to-music generation research—namely, its reliance on proprietary data and industrial-scale computational resources, which has hindered the establishment of fair and open academic benchmarks. To bridge this gap, we launch the ICME 2026 Text-to-Music Generation Challenge, built upon a CC-licensed instrumental subset of MTG-Jamendo, featuring dual tracks focused on efficiency and performance, with the requirement that participants train models from scratch. Our initiative establishes the first standardized benchmark tailored for the academic community, introduces a novel Concept Coverage Score (CCS), and releases open-source baseline models, preprocessing pipelines, and evaluation code. The comprehensive evaluation framework integrates Fréchet Audio Distance, CLAP score, CCS, and subjective listening tests, offering a reproducible, multi-dimensional assessment that substantially lowers the barrier to entry for research in this domain.
📝 Abstract
This paper presents an overview and the technical framework of the ICME 2026 Grand Challenge on Academic Text-to-Music Generation (ATTM). Despite the rapid progress in text-to-music generation (TTM) systems, the field is currently dominated by models trained on massive proprietary datasets with industrial-scale computational resources, creating a significant barrier for academic research. To address this, the ATTM Challenge establishes a fair-play benchmark that requires participants to train generative models strictly from scratch using a standardized, CC-licensed subset of the MTG-Jamendo dataset containing only instrumental music. The challenge is divided into two tracks: the Efficiency Track (limited to 500M parameters) and the Performance Track (no parameter limit). Submissions are evaluated through a multi-stage process involving objective metrics, including Frechet Audio Distance, CLAP score, and a novel Concept Coverage Score (CCS), followed by a subjective listening test. By providing open-source baselines, preprocessing pipelines, reference captions, and public evaluation code for computing FAD and CLAP, this challenge aims to facilitate and promote TTM research in academic contexts.