๐ค AI Summary
This study addresses the unclear effectiveness of adapting large language models (LLMs) to symbolic music tasks and the lack of systematic comparisons among fine-tuning strategies. We present the first comprehensive evaluation of supervised fine-tuning and preference optimization methods, assessing general-purpose instruction-tuned LLMs, domain-adapted variants, and specialized music LLMs across multiple ABC notation corpora on both generation and understanding tasks. Our experiments reveal a trade-off between effective domain adaptation and preservation of pre-trained knowledge, while also uncovering inconsistencies in commonly used automatic evaluation metrics within the symbolic music context. These findings offer empirical insights and practical guidance for future research on LLM adaptation in symbolic music processing.
๐ Abstract
Music often shares notable parallels with language, motivating the use of pretrained large language models (LLMs) for symbolic music understanding and generation. Despite growing interest, the practical effectiveness of adapting instruction-tuned LLMs to symbolic music remains insufficiently characterized. We present a controlled comparative study of finetuning strategies for ABC-based generation and understanding, comparing an off-the-shelf instruction-tuned backbone to domain-adapted variants and a music-specialized LLM baseline. Across multiple symbolic music corpora and evaluation signals, we provide some insights into adaptation choices for symbolic music applications. We highlight the domain adaptation vs.~preserving prior information tradeoff as well as the distinct behaviour of metrics used to measure the domain adaptation for symbolic music.