To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning

πŸ“… 2026-01-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the susceptibility of multimodal large language models (MLLMs) to perceptual fragility and hallucination in complex visual scenes, as well as their reliance on static and costly training data. To this end, the authors propose the Adversarial Opponent Training (AOT) framework, which introduces self-play reinforcement learning into MLLM robustness training for the first time. AOT dynamically generates adversarial examples through the co-evolution of an image-editing attacker and an MLLM defender, establishing a scalable training loop. Combined with supervised fine-tuning and a large-scale adversarial dataset, AOT-SFT, the approach significantly enhances the model’s perceptual robustness in complex scenarios and effectively suppresses hallucinations, demonstrating the effectiveness and scalability of the AOT framework.

Technology Category

Application Category

πŸ“ Abstract
Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) exhibit perceptual fragility when confronted with visually complex scenes. This weakness stems from a reliance on finite training datasets, which are prohibitively expensive to scale and impose a ceiling on model robustness. We introduce \textbf{AOT-SFT}, a large-scale adversarial dataset for bootstrapping MLLM robustness. Building on this, we propose \textbf{AOT (Adversarial Opponent Training)}, a self-play framework that forges MLLM robustness by creating its own training data. Our method orchestrates a co-evolution between an image-editing Attacker and a Defender MLLM, where the Attacker generates a diverse and dynamic curriculum of image manipulations, forcing the Defender to adapt and improve. Extensive experiments demonstrate that AOT enhances the Defender's perceptual robustness and reduces hallucinations, establishing a scalable paradigm for training more reliable MLLMs.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models
perceptual fragility
visual complexity
training data limitation
model robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Reinforcement Learning
Multimodal Large Language Models
Self-play Training
Perceptual Robustness
Adversarial Dataset
πŸ”Ž Similar Papers
No similar papers found.