🤖 AI Summary
This work addresses the inefficiency of prevailing test-time compute methods, which typically employ a planning-first strategy that incurs redundant overhead on problems solvable without explicit planning. To overcome this limitation, we propose Planning-after-Truth (PaT), a novel verification-driven adaptive scheduling mechanism that invokes a heavyweight planning model only when outputs from a lightweight generative model fail verification. By integrating heterogeneous language model collaboration with test-time compute scaling, PaT substantially advances the cost-performance Pareto frontier across multiple benchmarks, achieving performance comparable to large homogeneous models at approximately 69% lower inference cost.
📝 Abstract
Beyond training-time optimization, scaling test-time computation has emerged as a key paradigm to extend the reasoning capabilities of Large Language Models (LLMs). However, most existing methods adopt a rigid Planning-before-Trial (PbT) policy, which inefficiently allocates test-time compute by incurring planning overhead even on directly solvable problems. We propose Planning-after-Trial (PaT), an adaptive policy for code generation that invokes a planner only upon verification failure. This adaptive policy naturally enables a heterogeneous model configuration: a cost-efficient model handles generation attempts, while a powerful model is reserved for targeted planning interventions. Empirically, across multiple benchmarks and model families, our approach significantly advances the cost-performance Pareto frontier. Notably, our heterogeneous configuration achieves performance comparable to a large homogeneous model while reducing inference cost by approximately 69\%.