A Review on Edge Large Language Models: Design, Execution, and Applications

📅 2024-09-29

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Deploying large language models (LLMs) on edge devices faces fundamental challenges including constrained computational resources, limited memory capacity, and hardware heterogeneity. To address these, this paper systematically surveys the full lifecycle of edge LLMs and introduces a comprehensive, stack-wide technical taxonomy—spanning lightweight model design (e.g., pruning, quantization, distillation), runtime optimization (e.g., memory-aware inference, device-adaptive scheduling), and on-device deployment (e.g., cloud-edge collaborative frameworks). It proposes a novel cross-platform co-deployment paradigm that unifies pre-deployment model compression with dynamic execution optimization. Based on a rigorous synthesis of over 120 state-of-the-art studies, the work identifies five persistent technical bottlenecks and six key future research directions. The resulting methodology provides both a reusable conceptual framework and practical guidelines for deploying AI at the edge.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have revolutionized natural language processing with their exceptional capabilities. However, deploying LLMs on resource-constrained edge devices presents significant challenges due to computational limitations, memory constraints, and edge hardware heterogeneity. This survey summarizes recent developments in edge LLMs across their lifecycle, examining resource-efficient designs from pre-deployment techniques to runtime optimizations. Additionally, it explores on-device LLM applications in personal, enterprise, and industrial scenarios. By synthesizing advancements and identifying future directions, this survey aims to provide a comprehensive understanding of state-of-the-art methods for deploying LLMs on edge devices, bridging the gap between their immense potential and edge computing limitations.

Problem

Research questions and friction points this paper is trying to address.

Deploying LLMs on edge devices

Challenges in computational limitations

Optimizing edge LLMs lifecycle

Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge LLMs optimize resource-efficient design

Runtime inference enhances edge device performance

On-device applications span multiple domains

🔎 Similar Papers

Fine-Tuning and Deploying Large Language Models Over Edges: Issues and Approaches