🤖 AI Summary
Existing test-time scaling (TTS) methods rely on redundant sampling and neglect reuse of historical reasoning experience, resulting in low computational efficiency. This paper proposes Sticker-TTS—the first TTS framework built upon a “sticker” mechanism: it distills critical reasoning conditions into reusable, structured prompts (“stickers”) to orchestrate iterative co-optimization among three large reasoning models. We further design a two-stage optimization strategy integrating sticker distillation, imitation learning, and self-improvement to enable efficient transfer of historical experience. Evaluated on the AIME-24, AIME-25, and OlympMATH mathematical reasoning benchmarks, Sticker-TTS significantly outperforms self-consistency and reinforcement learning baselines under identical inference budgets, achieving both superior accuracy and improved computational efficiency.
📝 Abstract
Large reasoning models (LRMs) have exhibited strong performance on complex reasoning tasks, with further gains achievable through increased computational budgets at inference. However, current test-time scaling methods predominantly rely on redundant sampling, ignoring the historical experience utilization, thereby limiting computational efficiency. To overcome this limitation, we propose Sticker-TTS, a novel test-time scaling framework that coordinates three collaborative LRMs to iteratively explore and refine solutions guided by historical attempts. At the core of our framework are distilled key conditions-termed stickers-which drive the extraction, refinement, and reuse of critical information across multiple rounds of reasoning. To further enhance the efficiency and performance of our framework, we introduce a two-stage optimization strategy that combines imitation learning with self-improvement, enabling progressive refinement. Extensive evaluations on three challenging mathematical reasoning benchmarks, including AIME-24, AIME-25, and OlymMATH, demonstrate that Sticker-TTS consistently surpasses strong baselines, including self-consistency and advanced reinforcement learning approaches, under comparable inference budgets. These results highlight the effectiveness of sticker-guided historical experience utilization. Our code and data are available at https://github.com/RUCAIBox/Sticker-TTS.