Deep learning for music generation. Four approaches and their comparative evaluation

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Evaluating the aesthetic quality and applicability of diverse AI paradigms for melody generation remains challenging due to heterogeneous modeling assumptions and evaluation criteria. Method: This study systematically compares four approaches—Vision Transformer, sonification-augmented chat-based Transformer, Schillinger-theory-informed Transformer, and GPT-3 API—under a unified framework assessing cross-modal modeling, sonification, music-theoretic prior injection, and large language model (LLM) prompting. Crucially, we propose a structured rhythm modeling method grounded in Schillinger’s rhythmic theory, explicitly encoding rhythmic priors into the Transformer architecture. Contribution/Results: Our Schillinger-Transformer achieves superior balance between controllability and musical quality compared to existing sonification methods, validating the critical role of domain-specific musical knowledge in generative performance. Quantitative evaluation shows GPT-3 API yields the highest aesthetic scores, while our approach significantly enhances perceptual coherence and musicality. This work establishes the first standardized benchmark for evaluating these four distinct generative pathways in algorithmic melody synthesis.

Technology Category

Application Category

📝 Abstract
This paper introduces four different artificial intelligence algorithms for music generation and aims to compare these methods not only based on the aesthetic quality of the generated music but also on their suitability for specific applications. The first set of melodies is produced by a slightly modified visual transformer neural network that is used as a language model. The second set of melodies is generated by combining chat sonification with a classic transformer neural network (the same method of music generation is presented in a previous research), the third set of melodies is generated by combining the Schillinger rhythm theory together with a classic transformer neural network, and the fourth set of melodies is generated using GPT3 transformer provided by OpenAI. A comparative analysis is performed on the melodies generated by these approaches and the results indicate that significant differences can be observed between them and regarding the aesthetic value of them, GPT3 produced the most pleasing melodies, and the newly introduced Schillinger method proved to generate better sounding music than previous sonification methods.
Problem

Research questions and friction points this paper is trying to address.

Compare four AI algorithms for music generation
Evaluate aesthetic quality and application suitability
Assess differences in generated melodies' performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified visual transformer for melody generation
Combined Schillinger rhythm with transformer network
GPT3 transformer produced most pleasing melodies
🔎 Similar Papers
No similar papers found.