Fine-grained Fallacy Detection with Human Label Variation

๐Ÿ“… 2025-02-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses fine-grained misinformation detection in social media discourse on migration, climate change, and public healthโ€”domains characterized by high subjectivity and annotation disagreement. We propose the first explicit modeling of human annotator divergence for this task. To support it, we introduce Faina, a novel Italian dataset comprising over 11,000 instances annotated at span-level granularity across 20 distinct fallacy types, using cross-expert overlapping annotation and a multi-round consensus protocol. We design a principled evaluation framework accommodating multiple ground truths, partial span matching, and graded severity of fallacies. Furthermore, we develop a Transformer-based multi-task, multi-label model that significantly outperforms strong baselines across four detection settings (span identification, fallacy classification, severity estimation, and joint detection). All data, code, and annotation guidelines are publicly released to advance trustworthy, human-centered misinformation analysis research.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce Faina, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement. Faina includes over 11K span-level annotations with overlaps across 20 fallacy types on social media posts in Italian about migration, climate change, and public health given by two expert annotators. Through an extensive annotation study that allowed discussion over multiple rounds, we minimize annotation errors whilst keeping signals of human label variation. Moreover, we devise a framework that goes beyond"single ground truth"evaluation and simultaneously accounts for multiple (equally reliable) test sets and the peculiarities of the task, i.e., partial span matches, overlaps, and the varying severity of labeling errors. Our experiments across four fallacy detection setups show that multi-task and multi-label transformer-based approaches are strong baselines across all settings. We release our data, code, and annotation guidelines to foster research on fallacy detection and human label variation more broadly.
Problem

Research questions and friction points this paper is trying to address.

Detects fallacies in social media posts
Handles human label variation effectively
Supports multiple reliable test sets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with multiple plausible answers
Framework accounts for multiple test sets
Multi-task transformer-based detection approaches
A
Alan Ramponi
Digital Humanities group, Fondazione Bruno Kessler, Italy
A
Agnese Daffara
Department of Humanities, University of Pavia, Italy; Institute for Natural Language Processing, University of Stuttgart, Germany
Sara Tonelli
Sara Tonelli
Head of Research Unit, Fondazione Bruno Kessler
Natural Language ProcessingDigital HumanitiesArtificial Intelligence