Budget-Aware Anytime Reasoning with LLM-Synthesized Preference Data

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently generating high-quality intermediate reasoning outputs from large language models (LLMs) under constrained computational budgets. To this end, the authors propose an Anytime reasoning framework that leverages self-generated preference data from LLMs to drive contrastive learning over reasoning paths, coupled with an Anytime Index for dynamically evaluating solution quality. This approach enables continuous refinement of the reasoning process within a fixed computational budget, achieving a synergistic improvement in both output quality and inference efficiency. Experimental results across diverse benchmarks—including NaturalPlan, AIME, and GPQA—demonstrate consistent and significant gains over baseline methods when applied to Grok-3, GPT-series, and LLaMA models, thereby validating the effectiveness and generalizability of the proposed self-improvement mechanism.

Technology Category

Application Category

📝 Abstract
We study the reasoning behavior of large language models (LLMs) under limited computation budgets. In such settings, producing useful partial solutions quickly is often more practical than exhaustive reasoning, which incurs high inference costs. Many real-world tasks, such as trip planning, require models to deliver the best possible output within a fixed reasoning budget. We introduce an anytime reasoning framework and the Anytime Index, a metric that quantifies how effectively solution quality improves as reasoning tokens increase. To further enhance efficiency, we propose an inference-time self-improvement method using LLM-synthesized preference data, where models learn from their own reasoning comparisons to produce better intermediate solutions. Experiments on NaturalPlan (Trip), AIME, and GPQA datasets show consistent gains across Grok-3, GPT-oss, GPT-4.1/4o, and LLaMA models, improving both reasoning quality and efficiency under budget constraints.
Problem

Research questions and friction points this paper is trying to address.

anytime reasoning
computation budget
reasoning efficiency
partial solutions
LLM inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

anytime reasoning
computation budget
LLM-synthesized preference data
self-improvement
Anytime Index
🔎 Similar Papers
No similar papers found.