Advancing Multi-Robot Networks via MLLM-Driven Sensing, Communication, and Computation: A Comprehensive Survey

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges faced by multi-robot systems in executing high-level tasks driven by natural language instructions, which are often hindered by perceptual overload, limited communication bandwidth, and imbalanced computational resource allocation. The authors propose R2X, a novel collaborative framework that, for the first time, leverages task intent to jointly optimize perception, communication, and computation, thereby establishing an end-to-end closed loop from multimodal large language model–based semantic understanding to physical execution. Integrating edge–cloud collaboration, semantic-aware sensing selection, predictive communication, and digital twin technologies, R2X significantly outperforms purely edge-based baselines across diverse tasks—including warehouse navigation, mobile crowdsourcing, semantic following, and open-vocabulary waste sorting—demonstrating superior performance in payload efficiency, latency, and task success rate.
📝 Abstract
Imagine advanced humanoid robots, powered by multimodal large language models (MLLMs), coordinating missions across industries like warehouse logistics, manufacturing, and safety rescue. While individual robots show local autonomy, realistic tasks demand coordination among multiple agents sharing vast streams of sensor data. Communication is indispensable, yet transmitting comprehensive data can overwhelm networks, especially when a system-level orchestrator or cloud-based MLLM fuses multimodal inputs for route planning or anomaly detection. These tasks are often initiated by high-level natural language instructions. This intent serves as a filter for resource optimization: by understanding the goal via MLLMs, the system can selectively activate relevant sensing modalities, dynamically allocate bandwidth, and determine computation placement. Thus, R2X is fundamentally an intent-to-resource orchestration problem where sensing, communication, and computation are jointly optimized to maximize task-level success under resource constraints. This survey examines how integrated design paves the way for multi-robot coordination under MLLM guidance. We review state-of-the-art sensing modalities, communication strategies, and computing approaches, highlighting how reasoning is split between on-device models and powerful edge/cloud servers. We present four end-to-end demonstrations (sense -> communicate -> compute -> act): (i) digital-twin warehouse navigation with predictive link context, (ii) mobility-driven proactive MCS control, (iii) a FollowMe robot with a semantic-sensing switch, and (iv) real-hardware open-vocabulary trash sorting via edge-assisted MLLM grounding. We emphasize system-level metrics -- payload, latency, and success -- to show why R2X orchestration outperforms purely on-device baselines.
Problem

Research questions and friction points this paper is trying to address.

multi-robot coordination
resource orchestration
multimodal large language models
intent-driven optimization
R2X
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLLM-driven orchestration
intent-aware resource allocation
multi-robot coordination
R2X optimization
edge-cloud collaboration
🔎 Similar Papers
No similar papers found.
Hyun Jong Yang
Hyun Jong Yang
Dept. of Electrical & Computer Engineering, Seoul National University
CommunicationsSignal ProcessingMachine Learning
Howon Lee
Howon Lee
Professor, Ajou University
Wireless CommunicationsMachine Learning for Wireless
Kyuhong Shim
Kyuhong Shim
Sungkyunkwan University
Deep LearningSpeech ProcessingLanguage Processing
Jeongho Kwak
Jeongho Kwak
Department of Computer Science and Engineering, Korea University
Cloud/Edge ComputingIoT SystemsNetwork SoftwarizationSDN in 5G NetworksWireless Edge Caching
Hyunsoo Kim
Hyunsoo Kim
Department of Mechanical Engineering, Korea Advanced Institute of Science and Technology
Soft RoboticsHapticMechanical MetamaterialsSMALaser Induced Graphene
D
Donghoon Kim
Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
K
Khoa Anh Ngo
Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
Sehyun Ryu
Sehyun Ryu
Ph.D Candidate, POSTECH
wireless communicationsprivacy-preserving machine learninglearning theory
Jaehyun Choi
Jaehyun Choi
PhD Candidate @ KAIST
Dataset CondensationDomain Adaptation / Domain GeneralizationImage / Video Generation
Y
Youbin Kim
Department of AI Convergence Network, Ajou University, Suwon, South Korea
C
Chanjun Moon
Department of AI Convergence Network, Ajou University, Suwon, South Korea
M
Michael Ryoo
Department of Computer Science and the AI Institute, Stony Brook University, Stony Brook, NY 11794, USA
Byonghyo Shim
Byonghyo Shim
Professor, Department of Electrical and Computer Engineering, Seoul National University
Wireless CommunicationsDeep LearningInformation TheoryStatistical Signal Processing