CoALFake: Collaborative Active Learning with Human-LLM Co-Annotation for Cross-Domain Fake News Detection

📅 2026-04-05

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the limitations of existing fake news detection methods, which often exhibit poor generalization, rely heavily on large amounts of labeled data, and struggle to capture domain-specific characteristics. To overcome these challenges, the authors propose a cross-domain detection framework that integrates a human–large language model (LLM) collaborative annotation mechanism with a domain-aware active learning strategy. The framework leverages LLMs to generate high-quality pseudo-labels and combines domain embedding representation learning with diversity-based sampling to substantially reduce annotation costs while enhancing the model’s capacity to capture cross-domain features. Experimental results demonstrate that the proposed approach significantly outperforms current baselines across multiple benchmark datasets and maintains strong performance even under minimal human supervision.

Technology Category

Application Category

📝 Abstract

The proliferation of fake news across diverse domains highlights critical limitations in current detection systems, which often exhibit narrow domain specificity and poor generalization. Existing cross-domain approaches face two key challenges: (1) reliance on labelled data, which is frequently unavailable and resource intensive to acquire and (2) information loss caused by rigid domain categorization or neglect of domain-specific features. To address these issues, we propose CoALFake, a novel approach for cross-domain fake news detection that integrates Human-Large Language Model (LLM) co-annotation with domain-aware Active Learning (AL). Our method employs LLMs for scalable, low-cost annotation while maintaining human oversight to ensure label reliability. By integrating domain embedding techniques, the CoALFake dynamically captures both domain specific nuances and cross-domain patterns, enabling the training of a domain agnostic model. Furthermore, a domain-aware sampling strategy optimizes sample acquisition by prioritizing diverse domain coverage. Experimental results across multiple datasets demonstrate that the proposed approach consistently outperforms various baselines. Our results emphasize that human-LLM co-annotation is a highly cost-effective approach that delivers excellent performance. Evaluations across several datasets show that CoALFake consistently outperforms a range of existing baselines, even with minimal human oversight.

Problem

Research questions and friction points this paper is trying to address.

fake news detection

cross-domain

labelled data scarcity

domain-specific features

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-LLM co-annotation

domain-aware active learning

cross-domain fake news detection