Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency and accuracy degradation of large vision-language models (VLMs) on simple tasks, where over-reasoning often leads to unnecessarily verbose responses. While prior approaches overlook visual perception failure as a fundamental bottleneck, this paper proposes GPRO, a novel framework that decouples perception failures from reasoning errors for the first time. GPRO constructs supervision signals based on failure attribution and introduces a meta-reasoning controller that dynamically selects among a lightweight fast path, a slow perception path, or a slow reasoning path. Leveraging a teacher model to generate approximately 790,000 failure-attribution labels, the path selection strategy is optimized via multi-objective reinforcement learning. Experiments demonstrate that GPRO significantly improves both accuracy and inference efficiency across five benchmarks, outperforming existing "slow thinking" methods while producing more concise responses.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (LVLMs) have exhibited strong reasoning capabilities through chain-of-thought mechanisms that generate step-by-step rationales. However, such slow-thinking approaches often lead to overthinking, where models produce excessively verbose responses even for simple queries, resulting in test-time inefficiency and even degraded accuracy. Prior work has attempted to mitigate this issue via adaptive reasoning strategies, but these methods largely overlook a fundamental bottleneck: visual perception failures. We argue that stable reasoning critically depends on low-level visual grounding, and that reasoning errors often originate from imperfect perception rather than insufficient deliberation. To address this limitation, we propose Gated Perception-Reasoning Optimization (GPRO), a meta-reasoning controller that dynamically routes computation among three decision paths at each generation step: a lightweight fast path, a slow perception path for re-examining visual inputs, and a slow reasoning path for internal self-reflection. To learn this distinction, we derive large-scale failure attribution supervision from approximately 790k samples, using teacher models to distinguish perceptual hallucinations from reasoning errors. We then train the controller with multi-objective reinforcement learning to optimize the trade-off between task accuracy and computational cost under uncertainty. Experiments on five benchmarks demonstrate that GPRO substantially improves both accuracy and efficiency, outperforming recent slow-thinking methods while generating significantly shorter responses.
Problem

Research questions and friction points this paper is trying to address.

overthinking
large vision-language models
visual perception failures
reasoning errors
test-time inefficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gated Perception-Reasoning Optimization
visual perception failure
meta-reasoning controller
overthinking mitigation
multi-objective reinforcement learning
🔎 Similar Papers
No similar papers found.