A Comparative Study of CNN Optimization Methods for Edge AI: Exploring the Role of Early Exits

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Deploying deep neural networks on edge devices entails balancing accuracy, latency, and resource constraints. This work presents the first end-to-end hardware evaluation comparing static compression techniques—namely pruning and quantization—with dynamic early-exit mechanisms, all implemented within a unified ONNX inference framework. Experimental results demonstrate that static methods substantially reduce memory footprint, while early-exit strategies achieve input-adaptive computational savings. Crucially, combining both approaches yields simultaneous reductions in both latency and memory consumption with negligible accuracy loss, revealing their complementary nature and significant potential for joint optimization in edge computing scenarios.

Technology Category

Application Category

📝 Abstract

Deploying deep neural networks on edge devices requires balancing accuracy, latency, and resource constraints under realistic execution conditions. To fit models within these constraints, two broad strategies have emerged: static compression techniques such as pruning and quantization, which permanently reduce model size, and dynamic approaches such as early-exit mechanisms, which adapt computational cost at runtime. While both families are widely studied in isolation, they are rarely compared under identical conditions on physical hardware. This paper presents a unified deployment-oriented comparison of static compression and dynamic early-exit mechanisms, evaluated on real edge devices using ONNX based inference pipelines. Our results show that static and dynamic techniques offer fundamentally different trade-offs for edge deployment. While pruning and quantization deliver consistent memory footprint reduction, early-exit mechanisms enable input-adaptive computation savings that static methods cannot match. Their combination proves highly effective, simultaneously reducing inference latency and memory usage with minimal accuracy loss, expanding what is achievable at the edge.

Problem

Research questions and friction points this paper is trying to address.

Edge AI

Model Compression

Early Exit

Resource Constraints

Deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

early-exit

model compression

edge AI