🤖 AI Summary
Pretrained non-recurrent language models suffer from high computational costs and limited scalability in reasoning depth. Method: We propose “recursification fine-tuning”—a novel adaptation framework that progressively increases reasoning steps via recursive curriculum learning, decoupling model parameter count from computational cost during training. Our approach integrates a deep recurrent architecture, stride-based parameter sharing, and dynamic sequence-length expansion to efficiently retrofit existing models without architectural overhaul. Contribution/Results: On mathematical reasoning benchmarks, our method achieves significantly higher accuracy than standard post-training baselines under identical computational budgets—demonstrating superior FLOPs-to-performance efficiency. It enables deeper, more resource-efficient reasoning while preserving model expressivity, establishing a new paradigm for lightweight yet depth-enhanced large language models.
📝 Abstract
Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.