Maestro: Joint Graph & Config Optimization for Reliable AI Agents

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing LLM agent optimization methods focus solely on hyperparameter tuning while neglecting graph-structural deficiencies, leading to suboptimal agent design. Method: We propose the first framework for joint optimization of agent graph topology and node configurations. Our approach introduces a framework-agnostic global optimizer that synergistically combines reinforcement learning with gradient-informed heuristic search; it leverages reflective textual feedback from execution trajectories to guide iterative rollouts—enhancing sample efficiency and enabling precise localization of structural failures. Contributions/Results: (1) First end-to-end joint search over both structure and configuration; (2) Novel use of interpretable textual feedback as an optimization signal; (3) Achieves average improvements of 12%, 4.9%, and 4.86% over MIPROv2, GEPA, and GEPA+Merge on IFBench and HotpotQA, respectively, with fewer rollouts; additionally demonstrates strong generalization on interview and RAG agents.

Technology Category

Application Category

📝 Abstract

Building reliable LLM agents requires decisions at two levels: the graph (which modules exist and how information flows) and the configuration of each node (models, prompts, tools, control knobs). Most existing optimizers tune configurations while holding the graph fixed, leaving structural failure modes unaddressed. We introduce Maestro, a framework-agnostic holistic optimizer for LLM agents that jointly searches over graphs and configurations to maximize agent quality, subject to explicit rollout/token budgets. Beyond numeric metrics, Maestro leverages reflective textual feedback from traces to prioritize edits, improving sample efficiency and targeting specific failure modes. On the IFBench and HotpotQA benchmarks, Maestro consistently surpasses leading prompt optimizers--MIPROv2, GEPA, and GEPA+Merge--by an average of 12%, 4.9%, and 4.86%, respectively; even when restricted to prompt-only optimization, it still leads by 9.65%, 2.37%, and 2.41%. Maestro achieves these results with far fewer rollouts than GEPA. We further show large gains on two applications (interviewer & RAG agents), highlighting that joint graph & configuration search addresses structural failure modes that prompt tuning alone cannot fix.

Problem

Research questions and friction points this paper is trying to address.

Jointly optimizing graph structure and node configurations for LLM agents

Addressing structural failure modes ignored by fixed-graph optimizers

Maximizing agent quality under explicit token and rollout budgets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly optimizes graph structure and node configurations

Uses reflective textual feedback to prioritize edits

Maximizes agent quality under explicit budget constraints

🔎 Similar Papers

CompilerDream: Learning a Compiler World Model for General Code Optimization