SketchThinker-R1: Towards Efficient Sketch-Style Reasoning in Large Multimodal Models

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the high computational cost and inefficiency of large multimodal models caused by verbose reasoning processes. Inspired by humans’ concise, goal-directed sketch-like cognition, the authors propose a novel framework that integrates cold-start fine-tuning in sketch mode, training of a SketchJudge reward model, and reinforcement learning optimization. Evaluated across four benchmarks, the method reduces inference token consumption by over 64% on average while preserving answer accuracy. By effectively focusing on critical visual and textual cues, the approach significantly enhances both reasoning efficiency and interpretability without compromising performance.

Technology Category

Application Category

📝 Abstract

Despite the empirical success of extensive, step-by-step reasoning in large multimodal models, long reasoning processes inevitably incur substantial computational overhead, i.e., in terms of higher token costs and increased response time, which undermines inference efficiency. In contrast, humans often employ sketch-style reasoning: a concise, goal-directed cognitive process that prioritizes salient information and enables efficient problem-solving. Inspired by this cognitive efficiency, we propose SketchThinker-R1, which incentivizes sketch-style reasoning ability in large multimodal models. Our method consists of three primary stages. In the Sketch-Mode Cold Start stage, we convert standard long reasoning process into sketch-style reasoning and finetune base multimodal model, instilling initial sketch-style reasoning capability. Next, we train SketchJudge Reward Model, which explicitly evaluates thinking process of model and assigns higher scores to sketch-style reasoning. Finally, we conduct Sketch-Thinking Reinforcement Learning under supervision of SketchJudge to further generalize sketch-style reasoning ability. Experimental evaluation on four benchmarks reveals that our SketchThinker-R1 achieves over 64% reduction in reasoning token cost without compromising final answer accuracy. Qualitative analysis further shows that sketch-style reasoning focuses more on key cues during problem solving.

Problem

Research questions and friction points this paper is trying to address.

reasoning efficiency

computational overhead

token cost

large multimodal models

sketch-style reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

sketch-style reasoning

multimodal reasoning efficiency

reward modeling