Controlling Performance and Budget of a Centralized Multi-agent LLM System with Reinforcement Learning

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of jointly optimizing task performance and inference cost in multi-expert large language model (LLM) systems under dynamic budget constraints. To this end, we propose CoRL—a centralized multi-agent scheduling framework based on reinforcement learning—that jointly models expert selection and collaborative decision-making as a learnable bi-objective optimization problem, balancing task performance rewards against computational cost penalties. CoRL is the first approach to achieve budget-aware adaptive scheduling: it surpasses the best single-expert baseline under high budgets while maintaining high efficiency and robustness under tight budgets. Extensive evaluation across four benchmark tasks demonstrates that CoRL significantly improves cost-effectiveness and scalability of multi-LLM systems, establishing a novel paradigm for resource-constrained, cooperative LLM deployment.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) exhibit complementary strengths across domains and come with varying inference costs, motivating the design of multi-agent LLM systems where specialized models collaborate efficiently. Existing approaches predominantly rely on decentralized frameworks, which invoke multiple LLMs for every input and thus lead to substantial and uncontrolled inference costs. In this work, we introduce a centralized multi-LLM framework, where a controller LLM selectively coordinates a pool of expert models in a cost-efficient and cost-controllable manner. We formulate this coordination problem as reinforcement learning with dual objectives: maximizing task performance while minimizing the overall inference cost. In addition, we expect the multi-agent system to have adapted behavior with different budget conditions during inference. To this end, we propose CoRL, a reinforcement learning framework that optimizes the performance cost trade-off in a controllable multi-budget setting. Experiments on four diverse benchmarks demonstrate that CoRL enables a single system to surpass the best expert LLM under high-budget settings, while maintaining strong performance in more economical low-budget modes, highlighting the effectiveness of centralized coordination for scalable and cost-efficient multi-agent LLM systems.
Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-agent LLM system performance while controlling inference costs
Selectively coordinating expert models through centralized reinforcement learning
Adapting system behavior to different budget constraints during inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Centralized controller selectively coordinates expert LLMs
Reinforcement learning optimizes performance cost trade-off
Adaptive system behavior under different budget conditions
🔎 Similar Papers
No similar papers found.