Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Large language models (LLMs) often exhibit “overthinking” on complex tasks—generating excessively long, inefficient reasoning chains with marginal quality gains. Method: This paper introduces the first systematic exploration of model fusion–driven Long-to-Short (L2S) reasoning, proposing three fusion strategies—task-vector fusion, SVD-based fusion, and activation-aware fusion—evaluated across 1.5B–32B LLMs. Contribution/Results: We find model scale critically impacts fusion efficacy; fused models inherently support self-critique, self-correction, and adaptive response-length control per task. Empirically, L2S reduces average response length by 55% while maintaining or improving reasoning quality. It demonstrates strong efficiency, stability, and generalization across diverse multitask benchmarks. This work provides a scalable, low-overhead pathway to reconcile System 1–style speed with System 2–style depth in LLM inference.

Technology Category

Application Category

📝 Abstract

The transition from System 1 to System 2 reasoning in large language models (LLMs) has marked significant advancements in handling complex tasks through deliberate, iterative thinking. However, this progress often comes at the cost of efficiency, as models tend to overthink, generating redundant reasoning steps without proportional improvements in output quality. Long-to-Short (L2S) reasoning has emerged as a promising solution to this challenge, aiming to balance reasoning depth with practical efficiency. While existing approaches, such as supervised fine-tuning (SFT), reinforcement learning (RL), and prompt engineering, have shown potential, they are either computationally expensive or unstable. Model merging, on the other hand, offers a cost-effective and robust alternative by integrating the quick-thinking capabilities of System 1 models with the methodical reasoning of System 2 models. In this work, we present a comprehensive empirical study on model merging for L2S reasoning, exploring diverse methodologies, including task-vector-based, SVD-based, and activation-informed merging. Our experiments reveal that model merging can reduce average response length by up to 55% while preserving or even improving baseline performance. We also identify a strong correlation between model scale and merging efficacy with extensive evaluations on 1.5B/7B/14B/32B models. Furthermore, we investigate the merged model's ability to self-critique and self-correct, as well as its adaptive response length based on task complexity. Our findings highlight model merging as a highly efficient and effective paradigm for L2S reasoning, offering a practical solution to the overthinking problem while maintaining the robustness of System 2 reasoning. This work can be found on Github https://github.com/hahahawu/Long-to-Short-via-Model-Merging.

Problem

Research questions and friction points this paper is trying to address.

Balancing reasoning depth with efficiency in LLMs

Reducing redundant reasoning steps without quality loss

Integrating System 1 and System 2 model capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model merging balances System 1 and System 2 reasoning

Reduces response length by 55% without performance loss

Scales efficiently with model size from 1.5B to 32B

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting