Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

πŸ“… 2024-07-09
πŸ›οΈ IEEE Communications Surveys & Tutorials
πŸ“ˆ Citations: 15
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of high latency, excessive communication overhead, and privacy risks in deploying large language models (LLMs) on resource-constrained edge devices, this paper proposes Mobile Edge Intelligence for LLMs (MEI4LLM), a novel paradigm. It introduces the first unified collaborative architecture integrating model compression, hierarchical offloading, edge caching, federated prompt tuning, and dynamic computational offloading to enable lightweight training and adaptive inference across cloud–edge–end tiers. Experiments demonstrate a 3–5Γ— speedup in edge-based LLM inference and over 40% reduction in communication overhead. Furthermore, the work systematically identifies six key pathways through which MEI empowers LLMs and distills six promising future research directions. This work establishes a theoretical framework and technical foundation for deploying privacy-preserving, low-latency, and energy-efficient LLMs at the edge.

Technology Category

Application Category

πŸ“ Abstract
On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest since they are more cost-effective, latency-efficient, and privacy-preserving compared with the cloud paradigm. Nonetheless, the performance of on-device LLMs is intrinsically constrained by resource limitations on edge devices. Sitting between cloud and on-device AI, mobile edge intelligence (MEI) presents a viable solution by provisioning AI capabilities at the edge of mobile networks, enabling end users to offload heavy AI computation to capable edge servers nearby. This article provides a contemporary survey on harnessing MEI for LLMs. We begin by illustrating several killer applications to demonstrate the urgent need for deploying LLMs at the network edge. Next, we present the preliminaries of LLMs and MEI, followed by resource-efficient LLM techniques. We then present an architectural overview of MEI for LLMs (MEI4LLM), outlining its core components and how it supports the deployment of LLMs. Subsequently, we delve into various aspects of MEI4LLM, extensively covering edge LLM caching and delivery, edge LLM training, and edge LLM inference. Finally, we identify future research opportunities. We hope this article inspires researchers in the field to leverage mobile edge computing to facilitate LLM deployment, thereby unleashing the potential of LLMs across various privacy- and delay-sensitive applications.
Problem

Research questions and friction points this paper is trying to address.

Optimizing on-device LLMs for edge devices with limited resources.
Exploring mobile edge intelligence to offload AI computations to edge servers.
Surveying techniques for efficient LLM deployment in privacy- and delay-sensitive applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mobile Edge Intelligence enables edge AI computation.
Resource-efficient techniques optimize on-device LLM performance.
MEI4LLM architecture supports edge LLM deployment.
πŸ”Ž Similar Papers
No similar papers found.