Training Large Reasoning Models Efficiently via Progressive Thought Encoding

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the high memory and computational costs of autoregressive decoding in large reasoning models during reinforcement learning training, as well as the degradation of long-context coherence caused by sliding-window caching. To overcome these limitations, the authors propose Progressive Thought Encoding (PTE), a method that compresses intermediate reasoning steps into compact vector representations within a fixed-size cache, eliminating the need for full backpropagation through the entire sequence. This approach significantly reduces training memory consumption while maintaining constant memory usage during inference. PTE enables efficient long-horizon reasoning under parameter-efficient fine-tuning for the first time, surpassing the coherence constraints of existing caching strategies. Experiments on Qwen2.5-3B/7B and DeepSeek-R1-Distill-Llama-8B demonstrate average accuracy gains of 19.3% over LoRA and 29.9% over non-finetuned LRMs, with up to a 23.4% improvement on AIME2024/2025 benchmarks.

Technology Category

Application Category

📝 Abstract

Large reasoning models (LRMs) excel on complex problems but face a critical barrier to efficiency: reinforcement learning (RL) training requires long rollouts for outcome-based rewards, where autoregressive decoding dominates time and memory usage. While sliding-window cache strategies can bound memory, they disrupt long-context reasoning and degrade performance. We introduce Progressive Thought Encoding, a parameter-efficient fine-tuning method that enables LRMs to reason effectively under fixed-size caches. By progressively encoding intermediate reasoning into fixed-size vector representations, our approach eliminates the need to backpropagate through full-cache rollouts, thereby reducing memory usage, while maintaining constant memory during inference. Experiments on three models, including Qwen2.5-3B-Instruct, Qwen2.5-7B-Instruct, and DeepSeek-R1-Distill-Llama-8B, on six widely used challenging mathematical benchmarks show consistent gains: our method achieves +19.3% improvement over LoRA-based fine-tuning and +29.9% over LRMs without fine-tuning on average, with up to +23.4 accuracy improvement on AIME2024/2025 under the same tight cache budgets. These results demonstrate that Progressive Thought Encoding not only improves reasoning accuracy but also makes RL training of LRMs substantially more efficient and scalable under real-world memory constraints.

Problem

Research questions and friction points this paper is trying to address.

Large Reasoning Models

Reinforcement Learning

Memory Efficiency

Autoregressive Decoding

Long-context Reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Thought Encoding

Large Reasoning Models

Memory-Efficient Training