Thinking Long, but Short: Stable Sequential Test-Time Scaling for Large Reasoning Models

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work proposes Min-Seek, a training-free dynamic KV cache management mechanism that addresses the accuracy degradation and instability commonly observed in existing sequential inference methods when scaling to longer contexts, which often require cumbersome length-specific tuning. By discarding positional encodings and retaining only the most recent key-value (KV) pairs, Min-Seek dynamically re-encodes the context prior to each decoding step, enabling continuous inference beyond the model’s native context window. This approach eliminates the need for inference-length hyperparameter tuning, significantly improves long-sequence reasoning accuracy across diverse tasks, and maintains linear computational complexity, thereby achieving both efficiency and robustness.

Technology Category

Application Category

📝 Abstract

Sequential test-time scaling is a promising training-free method to improve large reasoning model accuracy, but as currently implemented, significant limitations have been observed. Inducing models to think for longer can increase their accuracy, but as the length of reasoning is further extended, it has also been shown to result in accuracy degradation and model instability. This work presents a novel sequential test-time scaling method, Min-Seek, which improves model accuracy significantly over a wide range of induced thoughts, stabilizing the accuracy of sequential scaling, and removing the need for reasoning length fine-tuning. Beyond improving model accuracy over a variety of reasoning tasks, our method is inherently efficient, as only the KV pairs of one additional induced thought are kept in the KV cache during reasoning. With a custom KV cache which stores keys without position embeddings, by dynamically encoding them contiguously before each new generated thought, our method can continue to reason well beyond a model's maximum context length, and under mild conditions has linear computational complexity.

Problem

Research questions and friction points this paper is trying to address.

sequential test-time scaling

reasoning length

model instability

accuracy degradation

large reasoning models

Innovation

Methods, ideas, or system contributions that make the work stand out.

sequential test-time scaling

Min-Seek

KV cache optimization