Training Data Size Sensitivity in Unsupervised Rhyme Recognition

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges in multilingual rhyme detection stemming from ambiguous definitions and inconsistent annotations, as well as the lack of systematic investigation into training data requirements. Leveraging RhymeTagger—an unsupervised, language-agnostic tool—the authors conduct the first quantitative analysis of the relationship between training data scale and rhyme recognition performance across poetic corpora in seven languages, introducing human annotation consistency as a realistic benchmark. Through phonetic feature analysis and comparative experiments with three large language models (LLMs), the study finds that RhymeTagger surpasses human annotation consistency given sufficient data, whereas LLMs lacking explicit phonetic representations perform markedly worse. The work further reveals how lexical distance and phonetic similarity influence annotation disagreement, offering new benchmarks and methodological insights for computational rhyme analysis across languages.
📝 Abstract
Rhyme is deceptively intuitive: what is or is not a rhyme is constructed historically, scholars struggle with rhyme classification, and people disagree on whether two words are rhymed or not. This complicates automated rhymed recognition and evaluation, especially in multilingual context. This article investigates how much training data is needed for reliable unsupervised rhyme recognition using RhymeTagger, a language-independent tool that identifies rhymes based on repeating patterns in poetry corpora. We evaluate its performance across seven languages (Czech, German, English, French, Italian, Russian, and Slovene), examining how training size and language differences affect accuracy. To set a realistic performance benchmark, we assess inter-annotator agreement on a manually annotated subset of poems and analyze factors contributing to disagreement in expert annotations: phonetic similarity between rhyming words and their distance from each other in a poem. We also compare RhymeTagger to three large language models using a one-shot learning strategy. Our findings show that, once provided with sufficient training data, RhymeTagger consistently outperforms human agreement, while LLMs lacking phonetic representation significantly struggle with the task.
Problem

Research questions and friction points this paper is trying to address.

unsupervised rhyme recognition
training data size
multilingual context
inter-annotator agreement
rhyme classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

unsupervised rhyme recognition
training data sensitivity
inter-annotator agreement
phonetic representation
multilingual evaluation
🔎 Similar Papers
No similar papers found.
Petr Plecháč
Petr Plecháč
Institute of Czech Literature, Czech Academy of Sciences
metricsversificationquantitative verse analysisprosodystylometry
A
Artjoms Šeļa
University of Tartu, Estonia
S
Silvie Cinková
Charles University, Czechia
Mirella De Sisto
Mirella De Sisto
Tilburg University
L
Lara Nugues
University of Basel, Switzerland
N
Neža Kočnik
University of Ljubljana, Slovenia
A
Antonina Martynenko
Institute of Czech Literature, Czech Academy of Sciences, Czechia
B
Ben Nagy
Institute of Polish Language, Polish Academy of Sciences, Poland
L
Luca Giovannini
University of Potsdam, Germany
R
Robert Kolár
Institute of Czech Literature, Czech Academy of Sciences, Czechia