🤖 AI Summary
This paper investigates whether large language models (LLMs) possess a generalizable, long-chain chain-of-thought (long CoT) reasoning capability. It identifies a latent, transferable long-chain reasoning representation structure in LLMs’ hidden space—distinct from standard CoT—but finds that its effective activation requires coupling with domain-specific information.
Method: To address this, the authors propose GLoRE (Generalized Long-chain Reasoning Enhancement), a representation-engineering framework that achieves task-agnostic representation disentanglement and cross-domain recalibration, augmented with lightweight adapter-based fine-tuning.
Contribution/Results: GLoRE significantly improves performance across diverse long-chain reasoning benchmarks, enabling efficient few-shot cross-domain transfer. Notably, both reasoning length and generalization capacity scale jointly. This work provides the first representational evidence that long CoT is a fundamental, general capability of LLMs, establishing a novel paradigm for controllable reasoning capability elicitation.
📝 Abstract
Recent advancements in long chain-of-thoughts(long CoTs) have significantly improved the reasoning capabilities of large language models(LLMs). Existing work finds that the capability of long CoT reasoning can be efficiently elicited by tuning on only a few examples and can easily transfer to other tasks. This motivates us to investigate whether long CoT reasoning is a general capability for LLMs. In this work, we conduct an empirical analysis for this question from the perspective of representation. We find that LLMs do encode long CoT reasoning as a general capability, with a clear distinction from vanilla CoTs. Furthermore, domain-specific representations are also required for the effective transfer of long CoT reasoning. Inspired by these findings, we propose GLoRE, a novel representation engineering method to unleash the general long CoT reasoning capabilities of LLMs. Extensive experiments demonstrate the effectiveness and efficiency of GLoRE in both in-domain and cross-domain scenarios.