ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal Large Reasoning Models (MLRMs) suffer from a mismatch wherein they over-reason on simple tasks yet under-explore complex ones. To address this, we propose ARES: a task-difficulty-aware adaptive reasoning framework for multimodal reasoning. Its core innovation is token-level difficulty-aware entropy shaping—introducing, for the first time, a sliding-window-based high-entropy token identification mechanism to locate critical reasoning steps, coupled with a two-stage training paradigm that synergizes cold-start initialization and dynamic exploration. ARES further integrates windowed entropy computation, Adaptive Entropy Policy Optimization (AEPO), hierarchical entropy rewards, and dynamic KL divergence constraints to precisely balance reasoning depth and breadth. Evaluated on mathematical, logical, and multimodal benchmarks, ARES achieves performance on par with or surpassing commercial systems while significantly reducing inference cost—e.g., 23–38% fewer tokens—demonstrating the effectiveness and generalizability of difficulty-driven adaptive reasoning.

Technology Category

Application Category

📝 Abstract
Recent advances in multimodal large reasoning models (MLRMs) have substantially improved their ability to solve complex textual and visual tasks. However, these models tend to overthink on simple problems, producing unnecessarily lengthy reasoning traces, while under-exploring on challenging ones, leading to missed solutions. To address this imbalance, we propose ARES, a unified open-source framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty. Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens (token-level entropies averaged under a sliding window) can reliably capture reasoning-critical moments; and (ii) reducing HWE usage benefits easy problems, while increasing it is essential for solving hard ones. Building on these insights, ARES introduces a two-stage training pipeline. In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness. In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers to decide when to explore, and a hierarchical entropy reward with dynamic KL control to decide how much to explore. Extensive experiments demonstrate that ARES achieves superior performance and reasoning efficiency across diverse mathematical, logical, and multimodal benchmarks, while closing the gap to leading commercial systems under significantly lower inference costs.
Problem

Research questions and friction points this paper is trying to address.

Addresses imbalance in reasoning effort between simple and complex multimodal problems
Dynamically allocates exploration based on task difficulty using entropy shaping
Reduces overthinking on easy tasks while improving hard problem solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic exploration effort allocation via difficulty awareness
HWE tokens as triggers for adaptive reasoning control
Two-stage training with entropy shaping and optimization
🔎 Similar Papers
No similar papers found.
S
Shuang Chen
University of California, Los Angeles
Y
Yue Guo
University of California, Los Angeles
Y
Yimeng Ye
Columbia University
Shijue Huang
Shijue Huang
Hong Kong University of Science and Technology
Large Language ModelsReasoningAgent
W
Wenbo Hu
University of California, Los Angeles
H
Haoxi Li
The Hong Kong University of Science and Technology
M
Manyuan Zhang
The Chinese University of Hong Kong
J
Jiayu Chen
The Hong Kong University of Science and Technology
Song Guo
Song Guo
Chair Professor of CSE, HKUST
Large Language ModelEdge AIMachine Learning Systems
N
Nanyun Peng
University of California, Los Angeles