RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the degradation of chain-of-thought (CoT) reasoning capability and output collapse in large language model (LLM) merging, this paper proposes RCP-Merging—a novel framework that, for the first time, explicitly models long-chain reasoning ability as prior knowledge. It introduces a reasoning capability indicator to safeguard critical reasoning weights and selectively injects domain-specific knowledge via weight-space fusion. Evaluated on Qwen2.5 and Llama3.1 across biomedical and financial domains, RCP-Merging achieves average performance gains of 9.5% (biomedical) and 9.2% (financial) over state-of-the-art merging methods. Crucially, it preserves original CoT reasoning fidelity without degradation or output instability—demonstrating strict backward compatibility in complex reasoning tasks. This work establishes a verifiable, scalable paradigm for synergistic enhancement of both general reasoning and domain expertise in merged LLMs.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) with long chain-of-thought (CoT) capability, termed Reasoning Models, demonstrate superior intricate problem-solving abilities through multi-step long CoT reasoning. To create a dual-capability model with long CoT capability and domain-specific knowledge without substantial computational and data costs, model merging emerges as a highly resource-efficient method. However, significant challenges lie in merging domain-specific LLMs with long CoT ones since nowadays merging methods suffer from reasoning capability degradation, even gibberish output and output collapse. To overcome this, we introduce RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior, a novel merging framework designed to integrate domain-specific LLMs with long CoT capability, meanwhile maintaining model performance in the original domain. Treating reasoning model weights as foundational prior, our method utilizes a reasoning capability indicator to preserve core long CoT capability model weights while selectively merging essential domain-specific weights. We conducted extensive experiments on Qwen2.5-7B, Llama3.1-8B, and Qwen2.5-1.5B models in BioMedicine and Finance domains. Our results show that RCP-Merging successfully merges a reasoning model with domain-specific ones, improving domain task performance by 9.5% and 9.2% over state-of-the-art methods, without significantly harming the original long CoT reasoning capability.

Problem

Research questions and friction points this paper is trying to address.

Merge domain-specific LLMs with long CoT models efficiently

Prevent reasoning capability degradation during model merging

Enhance domain task performance without sacrificing CoT ability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Merging reasoning and domain models efficiently

Preserving reasoning capability with prior weights

Enhancing domain performance without CoT degradation

🔎 Similar Papers

No similar papers found.