Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Unsupervised discovery of “music words”—reusable, structurally coherent musical fragments—in symbolic music remains challenging due to inherent semantic ambiguity. Method: We propose a sparse coding–based statistical modeling framework that formalizes music word discovery as an optimization problem minimizing total encoding length, thereby aligning with human perceptual coding principles and mitigating ambiguity. Our approach employs a two-stage EM algorithm integrating sparse representation learning with pattern dictionary learning, enabling interpretable decomposition and reconstruction of musical sequences. Contribution/Results: Evaluated on a multi-style dataset, the discovered music words achieve an IoU of 0.61 against expert annotations and demonstrate strong cross-style generalization. Beyond providing interpretable, structure-aware priors for music generation, classification, and style transfer, this work establishes the first information-compression–driven paradigm for structural analysis in computational musicology.

Technology Category

Application Category

📝 Abstract
This paper presents an unsupervised machine learning algorithm that identifies recurring patterns -- referred to as ``music-words'' -- from symbolic music data. These patterns are fundamental to musical structure and reflect the cognitive processes involved in composition. However, extracting these patterns remains challenging because of the inherent semantic ambiguity in musical interpretation. We formulate the task of music-word discovery as a statistical optimization problem and propose a two-stage Expectation-Maximization (EM)-based learning framework: 1. Developing a music-word dictionary; 2. Reconstructing the music data. When evaluated against human expert annotations, the algorithm achieved an Intersection over Union (IoU) score of 0.61. Our findings indicate that minimizing code length effectively addresses semantic ambiguity, suggesting that human optimization of encoding systems shapes musical semantics. This approach enables computers to extract ``basic building blocks'' from music data, facilitating structural analysis and sparse encoding. The method has two primary applications. First, in AI music, it supports downstream tasks such as music generation, classification, style transfer, and improvisation. Second, in musicology, it provides a tool for analyzing compositional patterns and offers insights into the principle of minimal encoding across diverse musical styles and composers.
Problem

Research questions and friction points this paper is trying to address.

Identifies recurring patterns in symbolic music data
Addresses semantic ambiguity through statistical optimization
Enables structural analysis and sparse encoding of music
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised learning algorithm identifies musical patterns
Two-stage EM framework optimizes music-word discovery
Minimizing code length resolves musical semantic ambiguity
🔎 Similar Papers
No similar papers found.
Tianle Wang
Tianle Wang
Brookhaven National Lab
High performance computation
S
Sirui Zhang
Central Conservatory of Music, China
X
Xinyi Tong
Central Conservatory of Music, China
Peiyang Yu
Peiyang Yu
Carnegie Mellon Univeristy
Large Language ModelsFake News DetectionMisinformation Detection
J
Jishang Chen
Central Conservatory of Music, China
L
Liangke Zhao
Central Conservatory of Music, China
X
Xinpu Gao
Department of Industrial Engineering, Ajou University, Korea
Y
Yves Zhu
Peking University, China
Tiezheng Ge
Tiezheng Ge
Senior staff algorithm engineer, Alimama, Alibaba Group
Computer VisionAIGCRecommender Systems
B
Bo Zheng
Alibaba Group, China
D
Duo Xu
Bigai, China
Y
Yang Liu
Peking University, China
X
Xin Jin
Bigai, China
Feng Yu
Feng Yu
University of Exeter
Efficient AIContinual LearningFederated LearningFoundation Model
S
Songchun Zhu
Bigai, China