🤖 AI Summary
Deep learning models (e.g., Transformers, diffusion models) for symbolic music generation suffer from high computational overhead, hindering deployment on commodity CPUs.
Method: This paper proposes a lightweight, parameter-free generative paradigm grounded in compression. It introduces the first application of Sequence Probability Assignment (SPA)—derived from the LZ78 lossless compression algorithm—to symbolic music modeling, establishing a theoretically grounded, compression-driven generative framework with provable convergence guarantees. The method operates directly on discrete MIDI sequences without training, or with negligible training cost.
Contribution/Results: Experiments show competitive generation quality versus state-of-the-art diffusion models (comparable FAD, WD, and KL divergence scores), while achieving 30× faster training and 300× faster inference. Crucially, it enables real-time, CPU-only music generation—significantly enhancing practicality, accessibility, and scalability for resource-constrained environments.
📝 Abstract
Recent advances in symbolic music generation primarily rely on deep learning models such as Transformers, GANs, and diffusion models. While these approaches achieve high-quality results, they require substantial computational resources, limiting their scalability. We introduce LZMidi, a lightweight symbolic music generation framework based on a Lempel-Ziv (LZ78)-induced sequential probability assignment (SPA). By leveraging the discrete and sequential structure of MIDI data, our approach enables efficient music generation on standard CPUs with minimal training and inference costs. Theoretically, we establish universal convergence guarantees for our approach, underscoring its reliability and robustness. Compared to state-of-the-art diffusion models, LZMidi achieves competitive Frechet Audio Distance (FAD), Wasserstein Distance (WD), and Kullback-Leibler (KL) scores, while significantly reducing computational overhead - up to 30x faster training and 300x faster generation. Our results position LZMidi as a significant advancement in compression-based learning, highlighting how universal compression techniques can efficiently model and generate structured sequential data, such as symbolic music, with practical scalability and theoretical rigor.