LATTS: Locally Adaptive Test-Time Scaling

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing test-time scaling methods uniformly increase computation across all samples and generation steps, ignoring instance-level local difficulty and leading to suboptimal resource efficiency. To address this, we propose a locally adaptive test-time scaling framework that dynamically estimates the local difficulty of each generation step using a validation model, and accordingly triggers fine-grained control actions—including resampling, backtracking, restart, or early termination—to allocate computational resources adaptively. Our core contribution is the introduction of a local difficulty awareness mechanism, which breaks away from the conventional paradigm of global, uniform computation expansion. Extensive experiments across multiple tasks demonstrate that our method maintains or even improves accuracy while significantly reducing average computational cost, thereby achieving a superior accuracy–efficiency trade-off.

Technology Category

Application Category

📝 Abstract

One common strategy for improving the performance of Large Language Models (LLMs) on downstream tasks involves using a emph{verifier model} to either select the best answer from a pool of candidates or to steer the auto-regressive generation process towards better outputs. This class of methods typically results in improved accuracy at the cost of increased computation at test-time, a paradigm known as emph{test-time scaling}. However, most existing approaches increase computation uniformly across all samples and generation steps, without considering the complexity of individual instances, leading to inefficient resource use. We address this limitation by proposing an approach, called emph{Locally Adaptive Test-Time Scaling (LATTS)}, that allocates variable compute across generation steps. Specifically, at each generation step, LATTS employs a verifier-based acceptance criterion to decide whether to resample, backtrack, restart, or stop the generation process. This criterion effectively adjusts the per-step computational effort based on a precise notion of emph{local difficulty} derived from the verifier model. Empirical results show that LATTS achieves significantly superior accuracy--compute tradeoffs compared to standard verifier-based methods.

Problem

Research questions and friction points this paper is trying to address.

Improving LLM performance with verifiers increases computation inefficiently

Existing methods apply uniform computation across all samples and steps

Need adaptive computation based on individual instance complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts computation per generation step dynamically

Uses verifier-based acceptance for resample/backtrack decisions

Allocates compute based on local difficulty of instances

🔎 Similar Papers

No similar papers found.