Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying large multimodal models in mobile edge networks faces significant challenges, including high latency, resource heterogeneity, and scheduling fairness. This work proposes a multi-agent AI framework that coordinates three types of agents—long-term planners, short-term prompt schedulers, and node deployers—to jointly optimize the inference pipeline. It is the first to integrate multi-agent collaboration and natural language reasoning into edge-based large model scheduling, enabling rapid adaptation to dynamic environments without fine-tuning. By combining a foundation language model–based multi-agent system, containerized deployment, and runtime telemetry analysis, the approach achieves over 80% reduction in average latency and a normalized Jain’s fairness index of 0.90 on a metropolitan-scale testbed, substantially outperforming existing baselines.

Technology Category

Application Category

📝 Abstract
Generative AI (GenAI) has transformed applications in natural language processing and content creation, yet centralized inference remains hindered by high latency, limited customizability, and privacy concerns. Deploying large models (LMs) in mobile edge networks emerges as a promising solution. However, it also poses new challenges, including heterogeneous multi-modal LMs with diverse resource demands and inference speeds, varied prompt/output modalities that complicate orchestration, and resource-limited infrastructure ill-suited for concurrent LM execution. In response, we propose a Multi-Agentic AI framework for latency- and fairness-aware multi-modal LM inference in mobile edge networks. Our solution includes a long-term planning agent, a short-term prompt scheduling agent, and multiple on-node LM deployment agents, all powered by foundation language models. These agents cooperatively optimize prompt routing and LM deployment through natural language reasoning over runtime telemetry and historical experience. To evaluate its performance, we further develop a city-wide testbed that supports network monitoring, containerized LM deployment, intra-server resource management, and inter-server communications. Experiments demonstrate that our solution reduces average latency by over 80% and improves fairness (Normalized Jain index) to 0.90 compared to other baselines. Moreover, our solution adapts quickly without fine-tuning, offering a generalizable solution for optimizing GenAI services in edge environments.
Problem

Research questions and friction points this paper is trying to address.

multi-modal large model
mobile edge networks
fairness-aware inference
latency
resource-constrained infrastructure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agentic AI
Fairness-aware Inference
Mobile Edge Computing
Multi-modal Large Models
Prompt Scheduling
🔎 Similar Papers
No similar papers found.
H
Haiyuan Li
Smart Internet Lab, Department of Electrical and Electronic Engineering, University of Bristol, BS8 1QU, U.K.
H
Hari Madhukumar
Smart Internet Lab, Department of Electrical and Electronic Engineering, University of Bristol, BS8 1QU, U.K.
Shuangyi Yan
Shuangyi Yan
Associate Professor, Smart Internet Lab, University of Bristol
optical networks5G and BeyondMachine LearningCoherent detection
Yulei Wu
Yulei Wu
Associate Professor, University of Bristol, UK
Digital TwinAI Native NetworkEdge IntelligenceTrustworthy AI
D
Dimitra Simeonidou
Smart Internet Lab, Department of Electrical and Electronic Engineering, University of Bristol, BS8 1QU, U.K.