Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Pretrained non-recurrent language models suffer from high computational costs and limited scalability in reasoning depth. Method: We propose “recursification fine-tuning”—a novel adaptation framework that progressively increases reasoning steps via recursive curriculum learning, decoupling model parameter count from computational cost during training. Our approach integrates a deep recurrent architecture, stride-based parameter sharing, and dynamic sequence-length expansion to efficiently retrofit existing models without architectural overhaul. Contribution/Results: On mathematical reasoning benchmarks, our method achieves significantly higher accuracy than standard post-training baselines under identical computational budgets—demonstrating superior FLOPs-to-performance efficiency. It enables deeper, more resource-efficient reasoning while preserving model expressivity, establishing a new paradigm for lightweight yet depth-enhanced large language models.

Technology Category

Application Category

📝 Abstract
Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.
Problem

Research questions and friction points this paper is trying to address.

Convert pretrained non-recurrent models to depth-recurrent architectures
Preserve performance while reducing computational costs through recurrence
Improve mathematical reasoning efficiency within fixed compute budgets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convert pretrained models into depth-recurrent models
Use curriculum recurrences to increase effective depth
Reduce computational cost while preserving performance
🔎 Similar Papers
No similar papers found.