🤖 AI Summary
Large reasoning models (LRMs) excel at generating long chain-of-thought (CoT) sequences, yet their internal reasoning mechanisms remain poorly understood—particularly the heterogeneous reasoning behaviors exhibited by models achieving identical accuracy. Method: We propose ReJump, the first framework to model CoT as a node-visitation sequence over a tree-structured reasoning space, explicitly distinguishing adjacent versus non-adjacent “jumps” to capture fine-grained reasoning actions—including computation, backtracking, and verification. Using LLM agents to extract reasoning trajectories and designing jump-based statistical metrics, we systematically characterize implicit reasoning strategies across models and tasks. Contribution/Results: Our analysis reveals substantial inter-model and inter-task variation in latent reasoning patterns, with training methodology shown to profoundly shape reasoning style. Furthermore, ReJump-informed candidate selection significantly improves reasoning quality. This work establishes a novel paradigm for interpretable analysis and controllable optimization of LRMs.
📝 Abstract
Large Reasoning Models (LRMs) are Large Language Models (LLMs) explicitly trained to generate long-form Chain-of-Thoughts (CoTs), achieving impressive success on challenging tasks like math and programming. However, their underlying reasoning "algorithms" remain poorly understood. To investigate this, we propose ReJump, which represents a reasoning trace as a visitation order over nodes in a tree of intermediate problem-solving steps. Transitions between nodes, which we term jumps, include adjacent moves that capture behaviors such as calculation, and non-adjacent moves that capture behaviors such as backtracking and verification. ReJump enables analyzing LLM reasoning with diverse metrics that quantify exploration, exploitation, overthinking, forgetting, and verification. Using our proposed LLM agent to extract reasoning traces into ReJump format, we evaluate state-of-the-art LRMs on two tasks and find that models with similar accuracy can exhibit distinct reasoning behaviors, while different tasks favor different reasoning styles (e.g., varying balance between exploration and exploitation). To further understand how learning strategies shape reasoning, we use ReJump to compare distilled LRMs with their teachers, CoT-prompted LLMs with LRMs, and to examine how the number of reasoning examples and reinforcement learning affect reasoning behavior. Finally, we show that ReJump can improve reasoning quality at test time through strategies such as ReJump-guided Best-of-N selection and prompt selection. Our code is publicly available at https://github.com/UW-Madison-Lee-Lab/ReJump.