ExecTune: Effective Steering of Black-Box LLMs with Guide Models

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the high cost of black-box large language model (LLM) API usage and the limitations of existing prompting methods in generating efficient, executable strategies under deployment constraints. To this end, the authors propose the Guide-Core Policies framework, which introduces the first formal executability metric for guide-core policy pairs and develops ExecTune—an end-to-end training approach that integrates teacher sampling, supervised fine-tuning, and structure-aware reinforcement learning. This method enables modular policy updates without retraining the core LLM. Experiments demonstrate that the approach improves accuracy by up to 9.2% on mathematical and coding tasks while reducing inference costs by 22.4%. Notably, it allows Claude Haiku 3.5 to achieve performance comparable to Sonnet 4 at 38% lower cost.

Technology Category

Application Category

📝 Abstract

For large language models deployed through black-box APIs, recurring inference costs often exceed one-time training costs. This motivates composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide-Core Policies (GCoP), in which a guide model generates a structured strategy that is executed by a black-box core model. This abstraction subsumes base, supervised, and advisor-style approaches, which differ primarily in how the guide is trained. We formalize GCoP under a cost-sensitive utility objective and show that end-to-end performance is governed by guide-averaged executability: the probability that a strategy generated by the guide can be faithfully executed by the core. Our analysis shows that existing GCoP instantiations often fail to optimize executability under deployment constraints, resulting in brittle strategies and inefficient computation. Motivated by these insights, we propose ExecTune, a principled training recipe that combines teacher-guided acceptance sampling, supervised fine-tuning, and structure-aware reinforcement learning to directly optimize syntactic validity, execution success, and cost efficiency. Across mathematical reasoning and code-generation benchmarks, GCoP with ExecTune improves accuracy by up to 9.2% over prior state-of-the-art baselines while reducing inference cost by up to 22.4%. It enables Claude Haiku 3.5 to outperform Sonnet 3.5 on both math and code tasks, and to come within 1.7% absolute accuracy of Sonnet 4 at 38% lower cost. Beyond efficiency, GCoP also supports modular adaptation by updating the guide without retraining the core.

Problem

Research questions and friction points this paper is trying to address.

Black-Box LLMs

Guide Models

Executability

Inference Cost

Structured Strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Guide-Core Policies

ExecTune

black-box LLMs