GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Deploying generative AI (GenAI) on resource-constrained edge devices faces significant challenges in latency, privacy, and energy efficiency. Method: This paper systematically surveys software–hardware co-design techniques for lightweight GenAI deployment, proposing the first three-dimensional taxonomy—spanning software optimization, hardware acceleration, and domain-specific frameworks—alongside a practical technology selection roadmap. Techniques covered include model pruning and quantization, knowledge distillation, sparse computation, heterogeneous acceleration (e.g., NPU/GPU), and lightweight frameworks (e.g., TinyLLM) with compiler-level optimizations. Contribution/Results: Based on a structured review of 200+ state-of-the-art works, the paper delineates applicability boundaries and performance trade-offs across technical pathways, thereby filling a critical gap in systematic, deployable guidance for edge GenAI. It delivers an industry-ready design reference and implementation benchmark for real-world edge AI systems.

Technology Category

Application Category

📝 Abstract

Generative Artificial Intelligence (GenAI) applies models and algorithms such as Large Language Model (LLM) and Foundation Model (FM) to generate new data. GenAI, as a promising approach, enables advanced capabilities in various applications, including text generation and image processing. In current practice, GenAI algorithms run mainly on the cloud server, leading to high latency and raising security concerns. Consequently, these challenges encourage the deployment of GenAI algorithms directly on edge devices. However, the large size of such models and their significant computational resource requirements pose obstacles when deploying them in resource-constrained systems. This survey provides a comprehensive overview of recent proposed techniques that optimize GenAI for efficient deployment on resource-constrained edge devices. For this aim, this work highlights three main categories for bringing GenAI to the edge: software optimization, hardware optimization, and frameworks. The main takeaways for readers of this survey will be a clear roadmap to design, implement, and refine GenAI systems for real-world implementation on edge devices.

Problem

Research questions and friction points this paper is trying to address.

Optimizing GenAI for edge deployment

Reducing latency and security risks

Addressing resource constraints in edge devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes GenAI for edge devices

Focuses on software and hardware

Introduces frameworks for edge deployment

🔎 Similar Papers

An Overview and Solution for Democratizing AI Workflows at the Network Edge