🤖 AI Summary
A significant computational gap impedes the deployment of artificial intelligence on embedded systems. Method: This paper presents a systematic survey of efficient deep learning infrastructure for embedded AI, covering network design, model compression, on-device learning, lightweight large language models, and software-hardware co-optimization. It introduces a novel seven-dimensional unified analytical framework that integrates training-inference, algorithm-application, and software-hardware dimensions across the full lifecycle. Contribution/Results: The survey identifies emerging directions—including integrated sensing, communication, and intelligence (ISCI), and neural-symbolic collaboration—and synthesizes over 100 representative works, such as neural architecture search (NAS), pruning/quantization/knowledge distillation, incremental learning, MoE-based model light-weighting, TinyML compilers, in-memory computing architectures, and RISC-V-based AI accelerators. It establishes a reproducible technology evolution map and delivers the first comprehensive, system-level introductory guide and practical deployment roadmap for embedded AI.
📝 Abstract
Deep neural networks (DNNs) have recently achieved impressive success across a wide range of real-world vision and language processing tasks, spanning from image classification to many other downstream vision tasks, such as object detection, tracking, and segmentation. However, previous well-established DNNs, despite being able to maintain superior accuracy, have also been evolving to be deeper and wider and thus inevitably necessitate prohibitive computational resources for both training and inference. This trend further enlarges the computational gap between computation-intensive DNNs and resource-constrained embedded computing systems, making it challenging to deploy powerful DNNs in real-world embedded computing systems towards ubiquitous embedded intelligence. To alleviate this computational gap and enable ubiquitous embedded intelligence, we focus in this survey on discussing recent efficient deep learning infrastructures for embedded computing systems, spanning from training to inference, from manual to automated, from convolutional neural networks to transformers, from transformers to vision transformers, from vision models to large language models, from software to hardware, and from algorithms to applications. Specifically, we discuss recent efficient deep learning infrastructures for embedded computing systems from the lens of (1) efficient manual network design for embedded computing systems, (2) efficient automated network design for embedded computing systems, (3) efficient network compression for embedded computing systems, (4) efficient on-device learning for embedded computing systems, (5) efficient large language models for embedded computing systems, (6) efficient deep learning software and hardware for embedded computing systems, and (7) efficient intelligent applications for embedded computing systems. We also envision promising future directions and trends, which have the potential to deliver more ubiquitous embedded intelligence. We believe this survey has its merits and can shed light on future research, which can largely help researchers to quickly and smoothly get started in this emerging field.