From Static Analysis to Audience Dissemination: A Training-Free Multimodal Controversy Detection Multi-Agent Framework

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the limitations of existing multimodal controversy detection methods, which rely on static representations and fail to capture the diverse perspectives of heterogeneous audiences. To overcome this, the authors propose AuDisAgent, a novel framework that reframes controversy detection as a dynamic propagation process by introducing a training-free multi-agent system to simulate multi-perspective audience evaluation and discussion of videos and comments. The framework employs three types of filtering agents—video, comment, and interaction—augmented with deliberation and arbitration mechanisms. Additionally, it incorporates a guidance strategy based on semantically similar historical comments to effectively mitigate cold-start issues. Experimental results demonstrate that AuDisAgent significantly outperforms state-of-the-art methods on public benchmarks and exhibits robust performance across both comment-rich and comment-scarce scenarios.

📝 Abstract

Multimodal controversy detection (MCD) identifies controversial content in videos and their associated user comments, to support risk management for social video platforms.Prior research frames MCD as a static representation learning task, where features are directly extracted from videos and their accompanying comments. However, these methods fail to capture the diverse perspectives and evaluations from different audience groups. Inspired by the real-world process of content dissemination among audiences, we propose AuDisAgent, a training-free multi-agent framework that reformulates MCD as a dynamic propagation process.Our framework explicitly models audience dissemination through a structured multi-agent system. First, three specialized Screening Agents (Video Agent, Comment Agent, and Interaction Agent) conduct initial assessments from visual, textual, and cross-modal perspectives, respectively. For samples where the three agents cannot reach a consensus, a Viewing Panel Agent is activated to simulate post-screening discussions among audiences with diverse backgrounds and stances. This mechanism models how different audience groups interpret and react to the same content, uncovering latent controversial content that may emerge during the dissemination process. Finally, an Arbitration Agent renders the final judgment based on the complete reasoning chain from the preceding steps.In addition, to address the "cold-start" scenario where newly released videos have few or no comments, we design a Comment Bootstrapping Strategy that leverages historical public comments from semantically similar videos as the initial comment context. Extensive experiments on a public dataset demonstrate that our framework significantly outperforms existing state-of-the-art (SOTA) methods in both rich-comment and limited-comment scenarios.

Problem

Research questions and friction points this paper is trying to address.

multimodal controversy detection

audience perspectives

content dissemination

controversial content identification

cold-start scenario

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

multi-agent framework

audience dissemination