Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

This study investigates the impact of transferring chain-of-thought (CoT) reasoning from one large language model to another on the recipient model’s inference and generation mechanisms. By establishing a provider–recipient framework and employing techniques such as CoT prefix truncation, forced-answer versus free-generation comparisons, and multi-model, multi-benchmark evaluation, the work reveals that CoT transfer operates through multiple pathways—including answer extraction, reasoning scaffolding, and dependence on the recipient model’s inherent capabilities—rather than a single uniform mechanism. The authors propose using answer consistency in the absence of ground-truth labels as an early stopping signal for reasoning and demonstrate across benchmarks including AIME, MMLU-Pro, and ZebraLogic that partial CoT prompts can effectively guide subsequent reasoning and enhance performance.

📝 Abstract

Large reasoning models (LRMs) often generate extensive chain-of-thought (CoT) traces before producing a final answer. As explicit textual artifacts, these traces can be passed to other models to solve the same task, enabling cross-model reasoning transfer. Yet successful transfer alone does not reveal how the provided CoT contributes to another model's answer. We study this question with a controlled provider--receiver framework, where a provider generates a reasoning trace and a receiver solves the same problem from increasingly longer trace prefixes. We compare force-answer, where the receiver answers directly from the prefix, with free-generation, where it may continue reasoning before answering. Across models and benchmarks, full traces often transfer successfully, but prefix trajectories reveal distinct mechanisms. In force-answer mode, AIME transfer is largely driven by explicit answer availability. MMLU-Pro instead reflects a larger role for receiver competence, while ZebraLogic depends on partial structured-answer information rather than complete-answer leakage alone. In free-generation mode, partial CoTs improve performance across benchmarks, indicating that prefixes can guide continued reasoning. Finally, answer agreement among receivers provides a gold-free signal for stopping provider reasoning early. Overall, cross-model CoT transfer is not a single phenomenon: it can reflect answer extraction, reasoning scaffolding, or receiver-dependent competence.

Problem

Research questions and friction points this paper is trying to address.

chain-of-thought

reasoning transfer

cross-model

large reasoning models

answer generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

chain-of-thought transfer

cross-model reasoning

reasoning scaffolding