Yesterday's News: Benchmarking Multi-Dimensional Out-of-Distribution Generalisation of Misinformation Detection Models

📅 2024-10-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Falsehood detection models exhibit poor generalization under multi-dimensional out-of-distribution (OOD) shifts, yet existing evaluations fail to expose their true limitations. To address this, we introduce *misinfo-general*, the first benchmark explicitly designed for multi-dimensional OOD generalization—systematically covering six shift axes: time, event, topic, source, political orientation, and rumor type. We formally define and quantify multi-axis OOD generalization, and propose: (i) distant-supervision-based covariate shift simulation; (ii) metadata-driven shift modeling; (iii) a multi-axis evaluation framework; and (iv) shortcut detection analysis. Experiments reveal substantial performance degradation of state-of-the-art models across temporal, event-, and source-level shifts; conventional accuracy severely overestimates robustness; and widely used metrics exhibit systematic failure under OOD conditions. The dataset and evaluation toolkit are publicly released to enable reproducible, rigorous robustness research.

Technology Category

Application Category

📝 Abstract
This article introduces misinfo-general, a benchmark dataset for evaluating misinformation models' ability to perform out-of-distribution generalization. Misinformation changes rapidly, much more quickly than moderators can annotate at scale, resulting in a shift between the training and inference data distributions. As a result, misinformation detectors need to be able to perform out-of-distribution generalization, an attribute they currently lack. Our benchmark uses distant labelling to enable simulating covariate shifts in misinformation content. We identify time, event, topic, publisher, political bias, misinformation type as important axes for generalization, and we evaluate a common class of baseline models on each. Using article metadata, we show how this model fails desiderata, which is not necessarily obvious from classification metrics. Finally, we analyze properties of the data to ensure limited presence of modelling shortcuts. We make the dataset and accompanying code publicly available: https://github.com/ioverho/misinfo-general
Problem

Research questions and friction points this paper is trying to address.

Evaluating misinformation models' out-of-distribution generalization ability
Addressing distribution shifts between training and inference data
Identifying key axes for generalization in misinformation detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses distant labelling for covariate shifts
Evaluates models on multiple generalization axes
Analyzes data to prevent modeling shortcuts
🔎 Similar Papers
No similar papers found.