Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

Composite AI systems—comprising heterogeneous, non-differentiable components (e.g., LLMs, domain-specific tools, traditional ML models) and diverse configuration types (prompts, hyperparameters, model weights)—resist end-to-end optimization. Method: We propose the Local Reward Function (LRF) framework, which assigns learnable, component-specific local rewards and enforces global alignment via consistency constraints; it employs gradient-free optimization to jointly tune heterogeneous configurations in parallel. Contribution/Results: Our key innovation is the decoupling of optimization from global differentiability: LRF dynamically adapts to component characteristics, enabling unified optimization of heterogeneous configurations without requiring end-to-end differentiability. Experiments on five real-world composite AI systems demonstrate an average performance gain of 11.92% over strong baselines, significantly improving both optimizability and generalizability.

Technology Category

Application Category

📝 Abstract

Compound AI systems integrating multiple components, such as Large Language Models, specialized tools, and traditional machine learning models, are increasingly deployed to solve complex real-world tasks. However, optimizing compound systems remains challenging due to their non-differentiable structures and diverse configuration types across components, including prompts, hyperparameters, and model parameters. To address this challenge, we propose Optimas, a unified framework for effective optimization of compound systems. The core idea of Optimas is to maintain one Local Reward Function (LRF) per component, each satisfying a local-global alignment property, i.e., each component's local reward correlates with the global system performance. In each iteration, Optimas efficiently adapts the LRFs to maintain this property while simultaneously maximizing each component's local reward. This approach enables independent updates of heterogeneous configurations using the designated optimization method, while ensuring that local improvements consistently lead to performance gains. We present extensive evaluations across five real-world compound systems to demonstrate that Optimas outperforms strong baselines by an average improvement of 11.92%, offering a general and effective approach for improving compound systems. Our website is at https://optimas.stanford.edu.

Problem

Research questions and friction points this paper is trying to address.

Optimizing non-differentiable compound AI systems

Aligning local rewards with global performance

Enabling independent updates of heterogeneous configurations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Local Reward Functions per component

Aligns local rewards with global performance

Enables independent optimization of configurations

🔎 Similar Papers

CompilerDream: Learning a Compiler World Model for General Code Optimization

2024-04-24Citations: 0

Snowflake

Menlo Park, California, United States

Authors to Follow