A Review on Edge Large Language Models: Design, Execution, and Applications

๐Ÿ“… 2024-09-29
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Deploying large language models (LLMs) on edge devices faces fundamental challenges including constrained computational resources, limited memory capacity, and hardware heterogeneity. To address these, this paper systematically surveys the full lifecycle of edge LLMs and introduces a comprehensive, stack-wide technical taxonomyโ€”spanning lightweight model design (e.g., pruning, quantization, distillation), runtime optimization (e.g., memory-aware inference, device-adaptive scheduling), and on-device deployment (e.g., cloud-edge collaborative frameworks). It proposes a novel cross-platform co-deployment paradigm that unifies pre-deployment model compression with dynamic execution optimization. Based on a rigorous synthesis of over 120 state-of-the-art studies, the work identifies five persistent technical bottlenecks and six key future research directions. The resulting methodology provides both a reusable conceptual framework and practical guidelines for deploying AI at the edge.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) have revolutionized natural language processing with their exceptional capabilities. However, deploying LLMs on resource-constrained edge devices presents significant challenges due to computational limitations, memory constraints, and edge hardware heterogeneity. This survey summarizes recent developments in edge LLMs across their lifecycle, examining resource-efficient designs from pre-deployment techniques to runtime optimizations. Additionally, it explores on-device LLM applications in personal, enterprise, and industrial scenarios. By synthesizing advancements and identifying future directions, this survey aims to provide a comprehensive understanding of state-of-the-art methods for deploying LLMs on edge devices, bridging the gap between their immense potential and edge computing limitations.
Problem

Research questions and friction points this paper is trying to address.

Deploying LLMs on edge devices
Challenges in computational limitations
Optimizing edge LLMs lifecycle
Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge LLMs optimize resource-efficient design
Runtime inference enhances edge device performance
On-device applications span multiple domains
๐Ÿ”Ž Similar Papers
Y
Yue Zheng
Zhejiang University of Technology, China
Y
Yuhao Chen
Zhejiang University, China
Bin Qian
Bin Qian
Post-doctoral researcher at Zhejiang University
internet of thingsedge computingdeep learning
X
Xiufang Shi
Zhejiang University of Technology, China
Yuanchao Shu
Yuanchao Shu
Microsoft Research
mobilenetworked systemsedge computingML analytics
J
Jiming Chen
Zhejiang University, China