🤖 AI Summary
Conventional MIDI-based absolute-pitch tokenization in symbolic music analysis suffers from limited expressive power and poor interpretability. Method: This paper proposes an interval-driven, range-based note tokenization paradigm that replaces absolute pitch with relative interval relationships as fundamental semantic units, establishing a theoretically consistent and structure-aware music tokenization framework. We design a symbolic tokenization strategy grounded in MIDI interval computation, tailored for Transformer architectures, and evaluate it uniformly across three tasks: key analysis, melody prediction, and harmonic recognition. Results: Experiments demonstrate significant accuracy improvements across all tasks. Moreover, the model’s attention mechanisms explicitly concentrate on musically meaningful structural features—such as stepwise motion and leaps—enabling, for the first time, a systematic integration of music-theoretic insights with neural language modeling capabilities.
📝 Abstract
Symbolic music analysis tasks are often performed by models originally developed for Natural Language Processing, such as Transformers. Such models require the input data to be represented as sequences, which is achieved through a process of tokenization. Tokenization strategies for symbolic music often rely on absolute MIDI values to represent pitch information. However, music research largely promotes the benefit of higher-level representations such as melodic contour and harmonic relations for which pitch intervals turn out to be more expressive than absolute pitches. In this work, we introduce a general framework for building interval-based tokenizations. By evaluating these tokenizations on three music analysis tasks, we show that such interval-based tokenizations improve model performances and facilitate their explainability.