Multi-MLLM Knowledge Distillation for Out-of-Context News Detection

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Detecting image-text mismatched news—where images are decontextualized and misused in disinformation—remains challenging under low-resource settings with scarce labeled data. Method: This paper proposes a multi-teacher collaborative distillation framework to enhance zero-shot discrimination capability of compact multimodal large language models (MLLMs). It introduces a novel two-stage knowledge distillation: (1) full fine-tuning via LoRA; and (2) targeted refinement on teacher-prediction-conflict samples by jointly optimizing LoRA and direct preference optimization (DPO) to strengthen hard-example modeling. Multi-teacher prompt generation, knowledge aggregation, and multimodal reasoning distillation further improve generalization. Contribution/Results: The method achieves state-of-the-art performance using less than 10% of labeled data, substantially outperforming existing zero-shot and few-shot approaches. It strikes an effective balance between high accuracy and low-cost deployment for practical misinformation detection.

Technology Category

Application Category

📝 Abstract
Multimodal out-of-context news is a type of misinformation in which the image is used outside of its original context. Many existing works have leveraged multimodal large language models (MLLMs) for detecting out-of-context news. However, observing the limited zero-shot performance of smaller MLLMs, they generally require label-rich fine-tuning and/or expensive API calls to GPT models to improve the performance, which is impractical in low-resource scenarios. In contrast, we aim to improve the performance of small MLLMs in a more label-efficient and cost-effective manner. To this end, we first prompt multiple teacher MLLMs to generate both label predictions and corresponding rationales, which collectively serve as the teachers' knowledge. We then introduce a two-stage knowledge distillation framework to transfer this knowledge to a student MLLM. In Stage 1, we apply LoRA fine-tuning to the student model using all training data. In Stage 2, we further fine-tune the student model using both LoRA fine-tuning and DPO on the data points where teachers' predictions conflict. This two-stage strategy reduces annotation costs and helps the student model uncover subtle patterns in more challenging cases. Experimental results demonstrate that our approach achieves state-of-the-art performance using less than 10% labeled data.
Problem

Research questions and friction points this paper is trying to address.

Detect multimodal out-of-context news efficiently
Improve small MLLMs without costly fine-tuning
Transfer knowledge from teacher to student MLLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-MLLM knowledge distillation for efficiency
Two-stage LoRA and DPO fine-tuning strategy
Label-efficient out-of-context news detection
🔎 Similar Papers
No similar papers found.