Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high latency and computational overhead of large reasoning models caused by structural redundancy in generating long reasoning chains. To mitigate this, the authors propose CoSMo, a novel framework that integrates consistency-guided dynamic merge-and-split optimization, structure-aligned reinforcement learning, and segment-level computational budget constraints to streamline reasoning chains while preserving logical completeness. Experimental results demonstrate that CoSMo consistently improves accuracy by an average of 3.3 percentage points across multiple benchmarks and backbone models, while reducing the number of reasoning segments used by 28.7%, thereby achieving a significant balance between efficiency and expressive reasoning capability.

Technology Category

Application Category

📝 Abstract
While Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains, this reliance on verbose generation results in significant latency and computational overhead. To address these challenges, we propose \textbf{CoSMo} (\textbf{Co}nsistency-Guided \textbf{S}plit-\textbf{M}erge \textbf{O}ptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume. Specifically, CoSMo utilizes a split-merge algorithm that dynamically refines reasoning chains by merging redundant segments and splitting logical gaps to ensure coherence. We then employ structure-aligned reinforcement learning with a novel segment-level budget to supervise the model in maintaining efficient reasoning structures throughout training. Extensive experiments across multiple benchmarks and backbones demonstrate that CoSMo achieves superior performance, improving accuracy by \textbf{3.3} points while reducing segment usage by \textbf{28.7\%} on average compared to reasoning efficiency baselines.
Problem

Research questions and friction points this paper is trying to address.

Large Reasoning Models
reasoning efficiency
computational overhead
reasoning chains
latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Split-Merge Optimization
Reasoning Efficiency
Structural Redundancy
Segment-Level Budget
Consistency-Guided Reinforcement Learning
R
Runquan Gui
University of Science and Technology of China
J
Jie Wang
University of Science and Technology of China
Zhihai Wang
Zhihai Wang
Qwen Team, Phd, USTC
Sample-Efficient Reinforcement LearningRL4LLMAgentic RL
C
Chi Ma
University of Science and Technology of China
Jianye Hao
Jianye Hao
Huawei Noah's Ark Lab/Tianjin University
Multiagent SystemsEmbodied AI
Feng Wu
Feng Wu
National University of Singapore
Mechine LearningMedical Time Series