CoALFake: Collaborative Active Learning with Human-LLM Co-Annotation for Cross-Domain Fake News Detection

📅 2026-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing fake news detection methods, which often exhibit poor generalization, rely heavily on large amounts of labeled data, and struggle to capture domain-specific characteristics. To overcome these challenges, the authors propose a cross-domain detection framework that integrates a human–large language model (LLM) collaborative annotation mechanism with a domain-aware active learning strategy. The framework leverages LLMs to generate high-quality pseudo-labels and combines domain embedding representation learning with diversity-based sampling to substantially reduce annotation costs while enhancing the model’s capacity to capture cross-domain features. Experimental results demonstrate that the proposed approach significantly outperforms current baselines across multiple benchmark datasets and maintains strong performance even under minimal human supervision.
📝 Abstract
The proliferation of fake news across diverse domains highlights critical limitations in current detection systems, which often exhibit narrow domain specificity and poor generalization. Existing cross-domain approaches face two key challenges: (1) reliance on labelled data, which is frequently unavailable and resource intensive to acquire and (2) information loss caused by rigid domain categorization or neglect of domain-specific features. To address these issues, we propose CoALFake, a novel approach for cross-domain fake news detection that integrates Human-Large Language Model (LLM) co-annotation with domain-aware Active Learning (AL). Our method employs LLMs for scalable, low-cost annotation while maintaining human oversight to ensure label reliability. By integrating domain embedding techniques, the CoALFake dynamically captures both domain specific nuances and cross-domain patterns, enabling the training of a domain agnostic model. Furthermore, a domain-aware sampling strategy optimizes sample acquisition by prioritizing diverse domain coverage. Experimental results across multiple datasets demonstrate that the proposed approach consistently outperforms various baselines. Our results emphasize that human-LLM co-annotation is a highly cost-effective approach that delivers excellent performance. Evaluations across several datasets show that CoALFake consistently outperforms a range of existing baselines, even with minimal human oversight.
Problem

Research questions and friction points this paper is trying to address.

fake news detection
cross-domain
labelled data scarcity
domain-specific features
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-LLM co-annotation
domain-aware active learning
cross-domain fake news detection
domain embedding
collaborative annotation
🔎 Similar Papers
No similar papers found.
E
Esma Aïmeur
Department of Computer Science and Operations Research, University of Montreal, Canada
Gilles Brassard
Gilles Brassard
Professor of computer science, Université de Montréal
quantum informationcryptographyfoundations of physics
D
Dorsaf Sallami
Department of Computer Science and Operations Research, University of Montreal, Canada