🤖 AI Summary
Deploying generative AI (GenAI) on resource-constrained edge devices faces significant challenges in latency, privacy, and energy efficiency. Method: This paper systematically surveys software–hardware co-design techniques for lightweight GenAI deployment, proposing the first three-dimensional taxonomy—spanning software optimization, hardware acceleration, and domain-specific frameworks—alongside a practical technology selection roadmap. Techniques covered include model pruning and quantization, knowledge distillation, sparse computation, heterogeneous acceleration (e.g., NPU/GPU), and lightweight frameworks (e.g., TinyLLM) with compiler-level optimizations. Contribution/Results: Based on a structured review of 200+ state-of-the-art works, the paper delineates applicability boundaries and performance trade-offs across technical pathways, thereby filling a critical gap in systematic, deployable guidance for edge GenAI. It delivers an industry-ready design reference and implementation benchmark for real-world edge AI systems.
📝 Abstract
Generative Artificial Intelligence (GenAI) applies models and algorithms such as Large Language Model (LLM) and Foundation Model (FM) to generate new data. GenAI, as a promising approach, enables advanced capabilities in various applications, including text generation and image processing. In current practice, GenAI algorithms run mainly on the cloud server, leading to high latency and raising security concerns. Consequently, these challenges encourage the deployment of GenAI algorithms directly on edge devices. However, the large size of such models and their significant computational resource requirements pose obstacles when deploying them in resource-constrained systems. This survey provides a comprehensive overview of recent proposed techniques that optimize GenAI for efficient deployment on resource-constrained edge devices. For this aim, this work highlights three main categories for bringing GenAI to the edge: software optimization, hardware optimization, and frameworks. The main takeaways for readers of this survey will be a clear roadmap to design, implement, and refine GenAI systems for real-world implementation on edge devices.