Cognitive Inception: Agentic Reasoning against Visual Deceptions by Injecting Skepticism

๐Ÿ“… 2025-11-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Multimodal large language models (MLLMs) struggle to distinguish AI-generated from authentic visual content, rendering their reasoning vulnerable to visual deception and compromising reliability. Method: We propose Inceptionโ€”a novel, fully inference-based skeptical enhancement agent architecture. It introduces internal and external skeptical agents that collaboratively perform multi-agent iterative reasoning, emulating human cognitive doubt mechanisms without relying on assumptions about data distribution, training annotations, or generator-specific features. Contribution/Results: Inception enables self-consistent visual authenticity verification through purely reasoning-driven validation. It exhibits strong generalization across diverse generative models and data domains. Evaluated on the AEGIS benchmark, it achieves state-of-the-art performance, significantly outperforming the strongest existing baselines. To our knowledge, Inception is the first framework to systematically enhance MLLMsโ€™ generalizable capability for authenticating AI-generated content (AIGC), advancing trustworthy multimodal reasoning.

Technology Category

Application Category

๐Ÿ“ Abstract
As the development of AI-generated contents (AIGC), multi-modal Large Language Models (LLM) struggle to identify generated visual inputs from real ones. Such shortcoming causes vulnerability against visual deceptions, where the models are deceived by generated contents, and the reliability of reasoning processes is jeopardized. Therefore, facing rapidly emerging generative models and diverse data distribution, it is of vital importance to improve LLMs' generalizable reasoning to verify the authenticity of visual inputs against potential deceptions. Inspired by human cognitive processes, we discovered that LLMs exhibit tendency of over-trusting the visual inputs, while injecting skepticism could significantly improve the models visual cognitive capability against visual deceptions. Based on this discovery, we propose extbf{Inception}, a fully reasoning-based agentic reasoning framework to conduct generalizable authenticity verification by injecting skepticism, where LLMs' reasoning logic is iteratively enhanced between External Skeptic and Internal Skeptic agents. To the best of our knowledge, this is the first fully reasoning-based framework against AIGC visual deceptions. Our approach achieved a large margin of performance improvement over the strongest existing LLM baselines and SOTA performance on AEGIS benchmark.
Problem

Research questions and friction points this paper is trying to address.

Improving LLMs' ability to verify visual input authenticity against AI-generated deceptions
Addressing vulnerability of multimodal models to visual deception from generated content
Enhancing generalizable reasoning through skepticism injection against AIGC visual manipulations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Injecting skepticism into multi-modal LLMs
Agentic reasoning framework with iterative enhancement
Fully reasoning-based approach against visual deceptions
๐Ÿ”Ž Similar Papers
No similar papers found.