Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems

📅 2026-01-15

📈 Citations: 1

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the high latency in multi-agent systems caused by multi-step reasoning and redundant agent invocations during parallel execution, which often fails to meet real-time requirements. To this end, the authors propose LAMaS, a novel framework that introduces explicit latency supervision signals into multi-agent orchestration for the first time. LAMaS employs a learning-driven controller to construct an execution topology graph and leverages critical path analysis to optimize parallel scheduling. This approach departs from conventional paradigms centered on task performance or cost, instead prioritizing latency reduction along the critical path. Experimental results demonstrate that LAMaS reduces critical path length by 38%–46% compared to state-of-the-art methods across multiple benchmarks, while maintaining or even improving task performance.

Technology Category

Application Category

📝 Abstract

Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS

Problem

Research questions and friction points this paper is trying to address.

multi-agent systems

latency

parallel execution

inference latency

time-sensitive scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

latency-aware orchestration

parallel multi-agent systems

critical path optimization