🤖 AI Summary
This work addresses the challenge of automatically generating ancient Chinese Ci poetry music—such as Song dynasty lyrical songs—which existing models struggle to reproduce due to their distinctive rhythmic structures and stylistic nuances. To this end, we propose SongSong, a novel staged-generation model that first predicts a melody from the textual input of a Song Ci poem, then separately synthesizes vocal and instrumental accompaniment tracks to produce a complete musical piece. Our approach achieves the first high-quality automatic reconstruction of Song Ci music and introduces OpenSongSong, the first large-scale dataset of ancient Song Ci music. Evaluations on 85 previously unseen Ci poems demonstrate that SongSong outperforms mainstream platforms such as Suno and SkyMusic in both subjective listening quality and objective metrics, establishing a new state of the art in this domain.
📝 Abstract
Recently, there have been significant advancements in music generation. However, existing models primarily focus on creating modern pop songs, making it challenging to produce ancient music with distinct rhythms and styles, such as ancient Chinese SongCi. In this paper, we introduce SongSong, the first music generation model capable of restoring Chinese SongCi to our knowledge. Our model first predicts the melody from the input SongCi, then separately generates the singing voice and accompaniment based on that melody, and finally combines all elements to create the final piece of music. Additionally, to address the lack of ancient music datasets, we create OpenSongSong, a comprehensive dataset of ancient Chinese SongCi music, featuring 29.9 hours of compositions by various renowned SongCi music masters. To assess SongSong's proficiency in performing SongCi, we randomly select 85 SongCi sentences that were not part of the training set for evaluation against SongSong and music generation platforms such as Suno and SkyMusic. The subjective and objective outcomes indicate that our proposed model achieves leading performance in generating high-quality SongCi music.