Synthetic Lyrics Detection Across Languages and Genres

๐Ÿ“… 2024-06-21
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
AI-generated lyrics raise critical concerns regarding copyright infringement, credibility erosion, and content safety. Method: This work introduces the first cross-lingual (12 languages), cross-genre (8 genres), multi-artist benchmark dataset of authentic versus synthetic lyrics, specifically designed for music-text synthesis detection. We propose a novel detection paradigm tailored to lyrical modality characteristics, featuring a few-shot, multi-style, cross-lingual generalization evaluation framework; integrate unsupervised domain adaptation (UDA) to enhance robustness for low-resource languages and unseen genres; and employ humanโ€“machine hybrid annotation to ensure label fidelity. Results: State-of-the-art detectors suffer substantial performance degradation on lyrics; UDA optimization yields up to 27.3% F1-score improvement. The framework demonstrates consistent accuracy across diverse evaluation dimensions, establishing the first reproducible, scalable technical benchmark for AI music governance.

Technology Category

Application Category

๐Ÿ“ Abstract
In recent years, the use of large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity. These advances provide valuable tools for artists and enhance their creative processes, but they also raise concerns about copyright violations, consumer satisfaction, and content spamming. Previous research has explored content detection in various domains. However, no work has focused on the text modality, lyrics, in music. To address this gap, we curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists. The generation pipeline was validated using both humans and automated methods. We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type. We also investigated methods to adapt the best-performing features to lyrics through unsupervised domain adaptation. Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings. Our findings show promising results that could inform policy decisions around AI-generated music and enhance transparency for users.
Problem

Research questions and friction points this paper is trying to address.

Detect synthetic lyrics across languages and genres
Address copyright and content spamming concerns in AI-generated lyrics
Evaluate and adapt detection methods for multilingual lyric datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated diverse dataset of real and synthetic lyrics
Evaluated existing synthetic text detection approaches
Adapted best-performing features via unsupervised domain adaptation
๐Ÿ”Ž Similar Papers
No similar papers found.