Training-Free and Interpretable Hateful Video Detection via Multi-stage Adversarial Reasoning

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses the limitations of existing hate video detection methods, which suffer from scarce training data, poor interpretability, and unreliable performance when directly employing large vision-language models (LVLMs). To overcome these challenges, the authors propose MARS, a novel framework that introduces, for the first time, a multi-stage adversarial reasoning mechanism to achieve high accuracy and strong interpretability without any model training. MARS neutrally describes video content and concurrently generates evidence both supporting and opposing a hate label, then fuses these perspectives to produce an interpretable final judgment. Evaluated on two real-world datasets, MARS substantially outperforms current training-free approaches—by up to 10% in accuracy—and even surpasses the best trained method on one dataset, while providing human-understandable rationales that enhance transparency and compliance in content moderation.

Technology Category

Application Category

📝 Abstract

Hateful videos pose serious risks by amplifying discrimination, inciting violence, and undermining online safety. Existing training-based hateful video detection methods are constrained by limited training data and lack of interpretability, while directly prompting large vision-language models often struggle to deliver reliable hate detection. To address these challenges, this paper introduces MARS, a training-free Multi-stage Adversarial ReaSoning framework that enables reliable and interpretable hateful content detection. MARS begins with the objective description of video content, establishing a neutral foundation for subsequent analysis. Building on this, it develops evidence-based reasoning that supports potential hateful interpretations, while in parallel incorporating counter-evidence reasoning to capture plausible non-hateful perspectives. Finally, these perspectives are synthesized into a conclusive and explainable decision. Extensive evaluation on two real-world datasets shows that MARS achieves up to 10% improvement under certain backbones and settings compared to other training-free approaches and outperforms state-of-the-art training-based methods on one dataset. In addition, MARS produces human-understandable justifications, thereby supporting compliance oversight and enhancing the transparency of content moderation workflows. The code is available at https://github.com/Multimodal-Intelligence-Lab-MIL/MARS.

Problem

Research questions and friction points this paper is trying to address.

hateful video detection

training-free

interpretability

large vision-language models

content moderation

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

interpretable AI

adversarial reasoning