Te Ahorré Un Click: A Revised Definition of Clickbait and Detection in Spanish News

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing definitions of clickbait lack conceptual clarity and fail to distinguish it rigorously from related phenomena such as sensationalism or content-title mismatch. Method: We propose “curiosity gap”—the deliberate omission of critical information to provoke epistemic curiosity—as the defining characteristic of clickbait, thereby redefining its conceptual boundaries. Using a multi-annotator agreement protocol (Fleiss’ κ = 0.825), we construct TA1C, the first open-source, human-annotated clickbait detection dataset for Spanish news, comprising 3,500 tweets. We further establish a strong baseline model achieving an F1-score of 0.84. Contribution/Results: This work advances the theoretical foundation of clickbait research and simultaneously addresses two critical gaps in low-resource NLP: the absence of high-quality, linguistically grounded annotations for Spanish and a reproducible, community-standard benchmark for clickbait detection.

Technology Category

Application Category

📝 Abstract
We revise the definition of clickbait, which lacks current consensus, and argue that the creation of a curiosity gap is the key concept that distinguishes clickbait from other related phenomena such as sensationalism and headlines that do not deliver what they promise or diverge from the article. Therefore, we propose a new definition: clickbait is a technique for generating headlines and teasers that deliberately omit part of the information with the goal of raising the readers' curiosity, capturing their attention and enticing them to click. We introduce a new approach to clickbait detection datasets creation, by refining the concept limits and annotations criteria, minimizing the subjectivity in the decision as much as possible. Following it, we created and release TA1C (for Te Ahorré Un Click, Spanish for Saved You A Click), the first open source dataset for clickbait detection in Spanish. It consists of 3,500 tweets coming from 18 well known media sources, manually annotated and reaching a 0.825 Fleiss' K inter annotator agreement. We implement strong baselines that achieve 0.84 in F1-score.
Problem

Research questions and friction points this paper is trying to address.

Redefining clickbait by focusing on curiosity gap creation
Creating a standardized dataset for Spanish clickbait detection
Developing accurate baselines for clickbait detection in Spanish
Innovation

Methods, ideas, or system contributions that make the work stand out.

Revised clickbait definition focusing on curiosity gap
New dataset creation approach minimizing annotation subjectivity
First open-source Spanish clickbait detection dataset
🔎 Similar Papers
No similar papers found.
G
Gabriel Mordecki
Universidad de la República, Montevideo, Uruguay
Guillermo Moncecchi
Guillermo Moncecchi
Profesor Adjunto, Universidad de la República, Uruguay
Natural Language Processing
J
Javier Couto
PEDECIBA, Uruguay