ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Large reasoning models (LRMs) frequently exhibit “overthinking” on simple tasks, resulting in redundant inference steps and excessive token consumption. Method: We propose the first general-purpose adaptive inference framework for LRMs, which jointly optimizes inference length, multimodal understanding, and executable code generation via reinforcement learning. Our approach introduces a length-aware reward mechanism to dynamically regulate the number of reasoning steps, without relying on task-specific heuristics. It supports generalized reasoning across heterogeneous input formats and modalities. Contribution/Results: Experiments demonstrate that our framework maintains task performance comparable to the baseline GRPO model while reducing average token consumption by over 70%. This substantial efficiency gain significantly enhances both inference speed and practical deployability of LRMs.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) often suffer from the ``over-thinking'' problem, generating unnecessarily long reasoning on simple tasks. Some strategies have been proposed to mitigate this issue, such as length penalties or routing mechanisms, but they are typically heuristic and task-specific, lacking a general framework for adaptive reasoning. In this paper, we present ARM2, a unified model that adaptively balances reasoning performance and efficiency across multiple formats through a reinforcement learning framework augmented with length-aware optimization. Beyond conventional natural language inference, ARM2 integrates vision understanding, extending its applicability to multimodal. Moreover, ARM2 integrates executable code into reasoning, enabling substantial reductions in token cost while preserving task performance compared to long CoT. Experiments demonstrate that ARM2 achieves performance on par with traditional reasoning models trained with GRPO, while reducing token usage by over 70% on average. We further conduct extensive analyses to validate the effectiveness of ARM2 and the soundness of its design.

Problem

Research questions and friction points this paper is trying to address.

ARM2 solves over-thinking in large reasoning models

It adaptively balances reasoning performance and efficiency

It integrates vision understanding and executable code

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning framework with length-aware optimization

Integrates vision understanding for multimodal reasoning

Uses executable code to reduce token usage

🔎 Similar Papers

No similar papers found.