Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing test-time scaling (TTS) methods rely on redundant sampling and neglect reuse of historical reasoning experience, resulting in low computational efficiency. This paper proposes Sticker-TTS—the first TTS framework built upon a “sticker” mechanism: it distills critical reasoning conditions into reusable, structured prompts (“stickers”) to orchestrate iterative co-optimization among three large reasoning models. We further design a two-stage optimization strategy integrating sticker distillation, imitation learning, and self-improvement to enable efficient transfer of historical experience. Evaluated on the AIME-24, AIME-25, and OlympMATH mathematical reasoning benchmarks, Sticker-TTS significantly outperforms self-consistency and reinforcement learning baselines under identical inference budgets, achieving both superior accuracy and improved computational efficiency.

Technology Category

Application Category

📝 Abstract

Large reasoning models (LRMs) have exhibited strong performance on complex reasoning tasks, with further gains achievable through increased computational budgets at inference. However, current test-time scaling methods predominantly rely on redundant sampling, ignoring the historical experience utilization, thereby limiting computational efficiency. To overcome this limitation, we propose Sticker-TTS, a novel test-time scaling framework that coordinates three collaborative LRMs to iteratively explore and refine solutions guided by historical attempts. At the core of our framework are distilled key conditions-termed stickers-which drive the extraction, refinement, and reuse of critical information across multiple rounds of reasoning. To further enhance the efficiency and performance of our framework, we introduce a two-stage optimization strategy that combines imitation learning with self-improvement, enabling progressive refinement. Extensive evaluations on three challenging mathematical reasoning benchmarks, including AIME-24, AIME-25, and OlymMATH, demonstrate that Sticker-TTS consistently surpasses strong baselines, including self-consistency and advanced reinforcement learning approaches, under comparable inference budgets. These results highlight the effectiveness of sticker-guided historical experience utilization. Our code and data are available at https://github.com/RUCAIBox/Sticker-TTS.

Problem

Research questions and friction points this paper is trying to address.

Improve computational efficiency in test-time scaling

Utilize historical experience for better reasoning

Coordinate multiple models for iterative solution refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coordinates three LRMs for iterative solution refinement

Uses distilled stickers to reuse critical historical information

Employs two-stage imitation learning with self-improvement optimization

🔎 Similar Papers

Disentangling Latent Shifts of In-Context Learning Through Self-Training