Beyond Where to Look: Trajectory-Guided Reinforcement Learning for Multimodal RLVR

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the challenge in multimodal large language models where reasoning chains often become decoupled from visual evidence due to ineffective fusion of visual information during reinforcement learning. To bridge this gap, the authors propose Trajectory-Guided Reinforcement Learning (TGRL), which introduces expert reasoning trajectories into the Multimodal Verifiable Reward Reinforcement Learning (RLVR) framework for the first time. TGRL employs a fine-grained guidance strategy to align the policy model with visual inputs and integrates token-level reweighting with trajectory filtering to optimize training dynamics. This approach achieves deep integration of visual perception and logical reasoning, significantly outperforming prior methods on multiple multimodal reasoning benchmarks and overcoming the limitations of conventional approaches that focus narrowly on visual grounding without holistic reasoning alignment.

Technology Category

Application Category

📝 Abstract

Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for multimodal large language models (MLLMs) have mainly focused on improving final answer correctness and strengthening visual grounding. However, a critical bottleneck remains: although models can attend to relevant visual regions, they often fail to effectively incorporate visual evidence into subsequent reasoning, leading to reasoning chains that are weakly grounded in visual facts. To address this issue, we propose Trajectory-Guided Reinforcement Learning (TGRL), which guides the policy model to integrate visual evidence into fine-grained reasoning processes using expert reasoning trajectories from stronger models. We further introduce token-level reweighting and trajectory filtering to ensure stable and effective policy optimization. Extensive experiments on multiple multimodal reasoning benchmarks demonstrate that TGRL consistently improves reasoning performance and effectively bridges the gap between visual perception and logical reasoning.

Problem

Research questions and friction points this paper is trying to address.

multimodal reasoning

visual grounding

reinforcement learning

reasoning chain

visual evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory-Guided Reinforcement Learning

Multimodal Reasoning

Visual Grounding