Test-time Recursive Thinking: Self-Improvement without External Feedback

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Large language models struggle to self-improve without external feedback or additional training, primarily due to difficulties in generating high-quality candidate solutions and selecting correct answers in an unsupervised setting. This work proposes the Test-time Recursive Thinking (TRT) framework, which enables iterative self-optimization during inference through strategy-guided reasoning, knowledge accumulation, and self-generated validation signals. TRT achieves effective self-improvement for the first time without relying on reinforcement learning or human annotations, establishing an end-to-end test-time optimization pipeline. Experimental results demonstrate that open-source models attain 100% accuracy on AIME-25/24, while closed-source models show performance gains of 10.4–14.8 percentage points on the most challenging problems in LiveCodeBench.

Technology Category

Application Category

📝 Abstract

Modern Large Language Models (LLMs) have shown rapid improvements in reasoning capabilities, driven largely by reinforcement learning (RL) with verifiable rewards. Here, we ask whether these LLMs can self-improve without the need for additional training. We identify two core challenges for such systems: (i) efficiently generating diverse, high-quality candidate solutions, and (ii) reliably selecting correct answers in the absence of ground-truth supervision. To address these challenges, we propose Test-time Recursive Thinking (TRT), an iterative self-improvement framework that conditions generation on rollout-specific strategies, accumulated knowledge, and self-generated verification signals. Using TRT, open-source models reach 100% accuracy on AIME-25/24, and on LiveCodeBench's most difficult problems, closed-source models improve by 10.4-14.8 percentage points without external feedback.

Problem

Research questions and friction points this paper is trying to address.

self-improvement

test-time

large language models

no external feedback

reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time Recursive Thinking

self-improvement

large language models