Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying large language models (LLMs) for intelligent Earth observation on low-Earth-orbit satellites, which are constrained by limited memory and high inference latency. To overcome these limitations, the authors propose a multi-satellite collaborative inference framework that partitions an LLM into submodels distributed across satellites, integrating pipeline parallelism with an adaptive compression mechanism for intermediate activations to reduce communication overhead and latency. The inference latency minimization problem is innovatively formulated as a shortest-path problem on a weighted directed acyclic graph, for which an enhanced A* algorithm is developed to achieve efficient optimization. Experimental results demonstrate that, compared to existing approaches, the proposed method reduces inference latency by up to 42%, cuts communication overhead by 71%, and incurs less than 1% accuracy degradation.
📝 Abstract
Low Earth orbit (LEO) satellites play an essential role in intelligent Earth observation by leveraging artificial intelligence models. However, limited onboard memory and excessive inference delay prevent the practical deployment of large language models (LLMs) on a single satellite. In this paper, we propose a communication-efficient collaborative LLM inference scheme for LEO satellite networks. Specifically, the entire LLM is split into multiple sub-models, with each deployed on a satellite, thereby enabling collaborative LLM inference via exchanging intermediate activations between satellites. The proposed scheme also leverages the pipeline parallelism mechanism that overlaps sub-model inference with intermediate activation transmission, thereby reducing LLM inference delay. An adaptive activation compression scheme is designed to mitigate cumulative errors from multi-stage model splitting while preserving inference accuracy. Furthermore, we formulate the LLM inference delay minimization problem by jointly optimizing model splitting and compression ratios under onboard memory and inference accuracy constraints. The problem is transformed into a shortest-path search problem over a directed acyclic graph that edge weights explicitly quantify the inference delay induced by model splitting and compression strategies, which is solved via a modified A Star-based search algorithm. Extensive simulation results indicate that the proposed solution can reduce inference delay by up to 42% and communication overhead by up to 71% compared to state-of-the-art benchmarks, while maintaining the inference accuracy loss of less than 1%.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
LEO Satellite Networks
Collaborative Inference
Communication Efficiency
Inference Delay
Innovation

Methods, ideas, or system contributions that make the work stand out.

collaborative LLM inference
LEO satellite networks
pipeline parallelism
adaptive activation compression
model splitting
🔎 Similar Papers
No similar papers found.
S
Songge Zhang
School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, 518055, China; and Frontier Research Center, Pengcheng Laboratory, Shenzhen, 518055, China
Wen Wu
Wen Wu
Associate Researcher, Pengcheng Laboratory, China, IEEE Senior Member
Wireless networkingnetwork AInetwork slicingdigital twin
Liang Li
Liang Li
Pengcheng Laboratory
Edge IntelligenceWireless Networks
Y
Ye Wang
Pengcheng Laboratory, Shenzhen, 518055, China
Xuemin (Sherman) Shen
Xuemin (Sherman) Shen
University Professor, University of Waterloo
Wireless networkingMACNetwork securityVANET