Boosting Reasoning in Large Multimodal Models via Activation Replay

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the intrinsic mechanisms underlying reasoning capabilities of large multimodal models (LMMs) trained via reinforcement learning with verifiable rewards (RLVR), and identifies— for the first time—the critical role of low-entropy activation states in multimodal reasoning. To leverage this insight, we propose Activation Replay: a training-free, fine-tuning-free, and policy-optimization-free method that, at test time, identifies low-entropy visual and linguistic tokens via logit lens and modulates their activations through replay-based reactivation. Operating solely via input-token manipulation, Activation Replay enables cross-model reasoning enhancement without architectural or parameter modifications. Experiments demonstrate significant improvements in Pass@K across mathematical reasoning, O3-class visual agent tasks, and video understanding—yielding more stable, broader-coverage reasoning than high-entropy replay and direct cross-model intervention baselines.

Technology Category

Application Category

📝 Abstract
Recently, Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach to incentivizing reasoning capability in Large Multimodal Models (LMMs), while the underlying mechanisms behind this post-training paradigm are poorly understood. We begin by exploring how input activations are affected by RLVR through the perspective of logit lens. Our systematic investigations across multiple post-trained LMMs suggest that RLVR shifts low-entropy activations unexpectedly, while high-entropy ones are less affected. We further demonstrate that such phenomena are associated with LMM reasoning by controlled experiments, suggesting a potentially beneficial role of modulating low-entropy activations. To this end, we propose Activation Replay, a novel simple yet effective training-free approach that boosts multimodal reasoning of post-trained LMMs without requiring expensive policy optimization. Our design involves manipulation of visual tokens at test time, replaying low-entropy activations from the input context of base LMMs to regulating the RLVR counterparts. Activation Replay triggers better reasoning across diverse scenarios, including mathematics, o3-like visual agents, and video reasoning. We further show that Activation Replay boosts Pass@K and mitigates narrower reasoning coverage of RLVR. Our design is compared against alternative choices, such as replaying high-entropy activations instead of low-entropy ones, or direct cross-model intervention instead of manipulating input tokens, demonstrating the superiority of our implementation. Codes will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Understanding RLVR's impact on activation entropy in multimodal models
Enhancing reasoning without costly training through activation manipulation
Improving reasoning coverage across mathematical and visual domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replays low-entropy activations from base models
Manipulates visual tokens during test time
Enhances reasoning without retraining or optimization
🔎 Similar Papers
No similar papers found.