π€ AI Summary
This study addresses the optimal allocation of computational resources under a fixed inference budget to maximize solution accuracy on programming competition problems. Through systematic experiments on 216 Codeforces problems spanning multiple difficulty levels, the authors compare agent-based reasoning against independent repeated sampling (k-shot) across varying numbers of model calls and associated costs, incorporating a prompt caching mechanism. The results consistently demonstrate that k-shot outperforms more complex agent-based approaches in both accuracy-cost and accuracy-query trade-offs. The work proposes βlog-failure-likelihood per dollarβ as a principled optimization criterion for solver design, revealing that in self-contained algorithmic tasks, simple sampling strategies are more cost-effective than elaborate reasoning frameworks. This finding establishes a new paradigm for code generation under resource constraints.
π Abstract
We study how to allocate inference-time compute for competitive programming under fixed budgets. Evaluating 216 Codeforces problems across Divisions 1-3, we compare agent-based reasoning with repeated independent sampling (k-shot) as a function of both cost and number of model calls. Across models and difficulty levels, k-shot consistently achieves a better accuracy-cost and accuracy-query tradeoff. This gap persists despite prompt caching in agent frameworks, indicating lower per-call effectiveness. Our results show that, for self-contained algorithmic tasks, independent exploration can outperform deeper agentic reasoning under realistic resource constraints. We also provide a budget-allocation analysis when the inference budget is fixed, and prove that a cost-optimal solver minimizes the principled metric log failure likelihood per dollar.