On Accelerating Edge AI: Optimizing Resource-Constrained Environments

📅 2025-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying AI models on resource-constrained edge devices remains challenging due to stringent latency, energy, and memory constraints. Method: This work proposes a unified AI acceleration paradigm integrating model compression, hardware-aware neural architecture search (NAS), and compiler-level deployment optimization. Specifically, it introduces: (1) a hierarchical knowledge distillation framework for large language models; (2) a lightweight, energy–accuracy–aware hierarchical NAS; and (3) the first open-path strategy that applies structural pruning during pretraining. The approach synergistically combines pruning, quantization, tensor decomposition, and compilation backends (TVM, TensorRT, OpenVINO), augmented with neuro-symbolic techniques. Contribution/Results: Evaluated across diverse edge hardware, the method achieves 2–5× lower inference latency, 40–65% reduced energy consumption, and 3–8× model size reduction, while preserving state-of-the-art accuracy. The work delivers a reproducible methodology and practical guidelines for cross-platform edge AI deployment.

Technology Category

Application Category

📝 Abstract
Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations. In this survey, we present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints. First, we examine model compression techniques-pruning, quantization, tensor decomposition, and knowledge distillation-that streamline large models into smaller, faster, and more efficient variants. Next, we explore Neural Architecture Search (NAS), a class of automated methods that discover architectures inherently optimized for particular tasks and hardware budgets. We then discuss compiler and deployment frameworks, such as TVM, TensorRT, and OpenVINO, which provide hardware-tailored optimizations at inference time. By integrating these three pillars into unified pipelines, practitioners can achieve multi-objective goals, including latency reduction, memory savings, and energy efficiency-all while maintaining competitive accuracy. We also highlight emerging frontiers in hierarchical NAS, neurosymbolic approaches, and advanced distillation tailored to large language models, underscoring open challenges like pre-training pruning for massive networks. Our survey offers practical insights, identifies current research gaps, and outlines promising directions for building scalable, platform-independent frameworks to accelerate deep learning models at the edge.
Problem

Research questions and friction points this paper is trying to address.

Edge Computing
Deep Learning
Resource Constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Compression
Neural Architecture Search (NAS)
Cross-platform Optimization
🔎 Similar Papers
No similar papers found.
J
Jacob Sander
Department of Computer Science, University of West Florida, Pensacola, FL, USA
A
Achraf Cohen
Department of Computer Science, University of West Florida, Pensacola, FL, USA
V
Venkat R. Dasari
DEVCOM Army Research Laboratory, Aberdeen Proving Ground, MD, USA
B
Brent Venable
Department of Computer Science, University of West Florida, Institute for Human & Machine Cognition, Pensacola, FL, USA
Brian Jalaian
Brian Jalaian
bjalaian@uwf.edu
Deep LearningLarge Language ModelsAgentic AITrustworthy AIOptimization