Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

πŸ“… 2025-02-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Autoregressive chain-of-thought (CoT) reasoning suffers from low long-range accuracy on complex problems exceeding training sequence length. Method: We propose RELAYβ€”a framework that (i) establishes the first explicit iterative alignment mechanism between recurrent Transformers and CoT steps, and (ii) introduces step-level supervision to enable reliable unseen-length CoT generation within recurrent models; it further employs two-stage joint optimization (recurrent generation β†’ autoregressive fine-tuning) to enhance the long-range reasoning capability of autoregressive models. Contribution/Results: RELAY overcomes the transfer bottleneck of recurrent Transformers in CoT generation, achieving state-of-the-art performance on GSM8K and MMLU benchmarks. It enables accurate, controllable CoT generation far beyond training length, demonstrating significant gains in long-range reasoning fidelity and generalization.

Technology Category

Application Category

πŸ“ Abstract
Chain-of-Thought (CoT) prompting has emerged as a powerful technique for enhancing language model's reasoning capabilities. However, generating long and correct CoT trajectories is challenging. Recent studies have demonstrated that Looped Transformers possess remarkable length generalization capabilities, but their limited generality and adaptability prevent them from serving as an alternative to auto-regressive solutions. To better leverage the strengths of Looped Transformers, we propose RELAY (REasoning through Loop Alignment iterativelY). Specifically, we align the steps of Chain-of-Thought (CoT) reasoning with loop iterations and apply intermediate supervision during the training of Looped Transformers. This additional iteration-wise supervision not only preserves the Looped Transformer's ability for length generalization but also enables it to predict CoT reasoning steps for unseen data. Therefore, we leverage this Looped Transformer to generate accurate reasoning chains for complex problems that exceed the training length, which will then be used to fine-tune an auto-regressive model. We conduct extensive experiments, and the results demonstrate the effectiveness of our approach, with significant improvements in the performance of the auto-regressive model. Code will be released at https://github.com/qifanyu/RELAY.
Problem

Research questions and friction points this paper is trying to address.

Enhance CoT reasoning with loop alignment
Improve Looped Transformers' adaptability and generality
Generate accurate reasoning chains for complex problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Loop-aligned reasoning enhancement
Intermediate supervision in training
Length generalization in transformers
πŸ”Ž Similar Papers
No similar papers found.