Optimizing Model Selection for Compound AI Systems

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of dynamic large language model (LLM) selection at the module level within composite AI systems. To tackle this, we propose LLMSelector—a framework grounded in the module-level performance monotonicity assumption and featuring an LLM-driven lightweight performance estimation algorithm that achieves linear-time optimal model allocation over an exponential candidate space. The framework further integrates meta-reasoning-guided iterative optimization with API call cost constraints. Evaluations on representative tasks—including multi-agent debate and self-optimization—using GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 demonstrate accuracy improvements of 5%–70% over uniform-model baselines. Critically, LLMSelector establishes the first fine-grained model selection paradigm that jointly optimizes accuracy, computational efficiency, and deployment feasibility.

Technology Category

Application Category

📝 Abstract
Compound AI systems that combine multiple LLM calls, such as self-refine and multi-agent-debate, achieve strong performance on many AI tasks. We address a core question in optimizing compound systems: for each LLM call or module in the system, how should one decide which LLM to use? We show that these LLM choices have a large effect on quality, but the search space is exponential. We propose LLMSelector, an efficient framework for model selection in compound systems, which leverages two key empirical insights: (i) end-to-end performance is often monotonic in how well each module performs, with all other modules held fixed, and (ii) per-module performance can be estimated accurately by an LLM. Building upon these insights, LLMSelector iteratively selects one module and allocates to it the model with the highest module-wise performance, as estimated by an LLM, until no further gain is possible. LLMSelector is applicable to any compound system with a bounded number of modules, and its number of API calls scales linearly with the number of modules, achieving high-quality model allocation both empirically and theoretically. Experiments with popular compound systems such as multi-agent debate and self-refine using LLMs such as GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 show that LLMSelector confers 5%-70% accuracy gains compared to using the same LLM for all modules.
Problem

Research questions and friction points this paper is trying to address.

Optimizing model selection for compound AI systems.
Addressing exponential search space in LLM choices.
Proposing efficient framework LLMSelector for model allocation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMSelector framework optimizes model selection
Monotonic performance guides iterative module selection
Linear API calls scale with module count