CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

๐Ÿ“… 2024-10-17
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing music search systems struggle with linguistic diversity and modality fragmentation, particularly in low-resource languages and cross-format retrieval (e.g., ABC notation, MIDI, text). To address these challenges, we propose the first multilingual multimodal music retrieval framework supporting 101 languages. Our method introduces: (i) a language-aligned unified multimodal encoder; (ii) high-quality, low-noise, balanced multilingual music descriptions generated by large language models (LLMs); and (iii) a trimodal contrastive learning mechanism over ABC, MIDI, and text to jointly optimize representations and enforce semantic alignment. Experiments demonstrate state-of-the-art performance on multilingual semantic music search and cross-modal music classification, with substantial gainsโ€”especially for low-resource languages. This work establishes a new benchmark for globally inclusive music information retrieval.

Technology Category

Application Category

๐Ÿ“ Abstract
Challenges in managing linguistic diversity and integrating various musical modalities are faced by current music information retrieval systems. These limitations reduce their effectiveness in a global, multimodal music environment. To address these issues, we introduce CLaMP 2, a system compatible with 101 languages that supports both ABC notation (a text-based musical notation format) and MIDI (Musical Instrument Digital Interface) for music information retrieval. CLaMP 2, pre-trained on 1.5 million ABC-MIDI-text triplets, includes a multilingual text encoder and a multimodal music encoder aligned via contrastive learning. By leveraging large language models, we obtain refined and consistent multilingual descriptions at scale, significantly reducing textual noise and balancing language distribution. Our experiments show that CLaMP 2 achieves state-of-the-art results in both multilingual semantic search and music classification across modalities, thus establishing a new standard for inclusive and global music information retrieval.
Problem

Research questions and friction points this paper is trying to address.

multilingual music search
diverse music genres
global music retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

CLaMP 2
Multilingual Music Search
Language Model Optimization
๐Ÿ”Ž Similar Papers
No similar papers found.