🤖 AI Summary
Fake news video detection suffers from scarce and insufficiently diverse training data, as well as spurious pattern biases—rooted in the complex many-to-many mapping between video segments and fabricated events, which existing datasets fail to model realistically. To address this, we propose AgentAug, the first large language model–driven framework for simulating creative misinformation generation processes. AgentAug explicitly models the segment–event many-to-many relationship via a multi-path generative mechanism that emulates four canonical fabrication strategies, and integrates uncertainty-based active learning to enhance both the quality and efficiency of data augmentation. Experiments on two benchmark datasets demonstrate that AgentAug significantly improves the performance of state-of-the-art detection models, effectively mitigating overfitting induced by data scarcity and bias, while substantially boosting model generalization.
📝 Abstract
The emergence of fake news on short video platforms has become a new significant societal concern, necessitating automatic video-news-specific detection. Current detectors primarily rely on pattern-based features to separate fake news videos from real ones. However, limited and less diversified training data lead to biased patterns and hinder their performance. This weakness stems from the complex many-to-many relationships between video material segments and fabricated news events in real-world scenarios: a single video clip can be utilized in multiple ways to create different fake narratives, while a single fabricated event often combines multiple distinct video segments. However, existing datasets do not adequately reflect such relationships due to the difficulty of collecting and annotating large-scale real-world data, resulting in sparse coverage and non-comprehensive learning of the characteristics of potential fake news video creation. To address this issue, we propose a data augmentation framework, AgentAug, that generates diverse fake news videos by simulating typical creative processes. AgentAug implements multiple LLM-driven pipelines of four fabrication categories for news video creation, combined with an active learning strategy based on uncertainty sampling to select the potentially useful augmented samples during training. Experimental results on two benchmark datasets demonstrate that AgentAug consistently improves the performance of short video fake news detectors.