KoWit-24: A Richly Annotated Dataset of Wordplay in News Headlines

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

119K/year

🤖 AI Summary

Existing humor datasets inadequately model wordplay—especially idioms, fixed expressions, and named-entity transformations—in news headlines. Method: We construct the first fine-grained Russian news headline wordplay annotation dataset (2,700 instances), annotated for existence, type, anchor token, and referent; uniquely augmented with authentic news leads and summaries to enrich contextual grounding. We propose a context-aware wordplay typology framework, design a rigorous human annotation protocol, and establish an LLM-based evaluation benchmark. Contribution/Results: Experiments reveal that five state-of-the-art LLMs perform significantly poorly on both detection and explanation tasks, confirming the dataset’s high difficulty. The dataset, code, and evaluation toolkit are publicly released, providing a critical resource and a new benchmark for computational humor research.

Technology Category

Application Category

📝 Abstract

We present KoWit-24, a dataset with fine-grained annotation of wordplay in 2,700 Russian news headlines. KoWit-24 annotations include the presence of wordplay, its type, wordplay anchors, and words/phrases the wordplay refers to. Unlike the majority of existing humor collections of canned jokes, KoWit-24 provides wordplay contexts -- each headline is accompanied by the news lead and summary. The most common type of wordplay in the dataset is the transformation of collocations, idioms, and named entities -- the mechanism that has been underrepresented in previous humor datasets. Our experiments with five LLMs show that there is ample room for improvement in wordplay detection and interpretation tasks. The dataset and evaluation scripts are available at https://github.com/Humor-Research/KoWit-24

Problem

Research questions and friction points this paper is trying to address.

Detecting and interpreting wordplay in Russian news headlines.

Providing context-rich annotations for wordplay types and mechanisms.

Evaluating and improving wordplay detection in large language models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained annotation of wordplay in headlines

Includes context with news lead and summary

Focus on underrepresented wordplay mechanisms

🔎 Similar Papers

No similar papers found.