🤖 AI Summary
This work investigates inference-time computation scaling to enhance language model performance without increasing model parameters or training cost, and examines its generalization to multimodal tasks. To this end, we propose Recursive Inference Scaling (RINS), the first framework that explicitly couples the fractal geometric structure of natural language with recursive reasoning depth. RINS introduces a fractal-inspired reasoning path, a recursive scheduling mechanism, and a data-driven scaling law model. Compared to prior methods such as RAO, RINS achieves superior robustness, adaptability, and asymptotic performance ceilings. Experiments demonstrate that RINS significantly improves language modeling and multimodal understanding: on SigLIP-B/16, it boosts zero-shot ImageNet accuracy by +2%. Crucially, RINS supports on-demand deactivation of additional inference overhead, incurring negligible performance degradation—enabling efficient, context-aware deployment without architectural or training modifications.
📝 Abstract
Recent research in language modeling reveals two scaling effects: the well-known improvement from increased training compute, and a lesser-known boost from applying more sophisticated or computationally intensive inference methods. Inspired by recent findings on the fractal geometry of language, we introduce Recursive INference Scaling (RINS) as a complementary, plug-in recipe for scaling inference time. For a given fixed model architecture and training compute budget, RINS substantially improves language modeling performance. It also generalizes beyond pure language tasks, delivering gains in multimodal systems, including a +2% improvement in 0-shot ImageNet accuracy for SigLIP-B/16. Additionally, by deriving data scaling laws, we show that RINS improves both the asymptotic performance limits and the scaling exponents. These advantages are maintained even when compared to state-of-the-art recursive techniques like the"repeat-all-over"(RAO) strategy in Mobile LLM. Finally, stochastic RINS not only can enhance performance further but also provides the flexibility to optionally forgo increased inference computation at test time with minimal performance degradation.