When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the trade-offs among task accuracy, computational cost, and device energy consumption in cloud-edge collaborative multi-agent systems. The authors propose a hybrid architecture that integrates large language models (LLMs) on the cloud with small language models (SLMs) on edge devices, enabling flexible hybrid inference through two representative multi-agent designs. Employing a controlled-variable methodology, they systematically analyze the impact of collaboration strategies and, for the first time, characterize the Pareto frontier of such systems. Their findings reveal that the optimal architecture is highly task-dependent and that greater computational power does not necessarily yield better performance. Furthermore, the work quantifies the potential for SLMs to enhance their capabilities through collaboration with LLMs, explicitly delineating a tunable balance among accuracy, cost, and energy consumption.
📝 Abstract
The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-device inference. Hybrid multi-agent systems (MASs) combining on-device and cloud models offer a promising middle ground, but they also introduce a complex and poorly understood design space in which task accuracy, monetary cost, and edge energy consumption are tightly coupled; in the absence of general design principles, hybrid components, although not the most prevalent choice, are typically introduced through ad hoc decisions tailored to specific domains. In this work, we examine this design space more systematically. We adapt two representative MAS architectures to support hybrid inference and study how individual design choices shift the operating point along the Pareto frontier of power, cost, and performance. Our findings paint a nuanced picture of hybrid MAS design: while SLMs can effectively benefit from LLM assistance, the optimal architecture is highly task-dependent, and greater frontier-level compute does not consistently translate to better performance.
Problem

Research questions and friction points this paper is trying to address.

hybrid multi-agent systems
large language models
small language models
on-device inference
design space
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid multi-agent systems
on-device inference
large language models
small language models
Pareto optimization