🤖 AI Summary
Adaptive model deployment on heterogeneous edge devices remains challenging due to cumbersome model-aware design and time-consuming hardware-aware analysis in existing SuperNet approaches. Method: This paper proposes the first end-to-end automated SuperNet deployment framework: it leverages computation-graph-guided compilation to automatically transform arbitrary user models into lightweight supernets; and integrates learning-free latency and accuracy predictors for zero-shot, low-overhead cross-hardware performance estimation and model specialization. Contributions/Results: Compared to state-of-the-art methods, our framework reduces supernet code size by 11–27×, cuts hardware tuning cost by over 11×, improves absolute accuracy by up to 15.60%, and reduces inference latency by 60.03%.
📝 Abstract
On-device machine learning (ML) has become a fundamental component of emerging mobile applications. Adaptive model deployment delivers efficient inference for heterogeneous device capabilities and performance requirements through customizing neural architectures. SuperNet-based approaches offer a promising solution by generating a large number of model variants from a pre-trained ML model. However, applying SuperNet in existing frameworks suffers from tedious model-aware development and time-consuming hardware-aware profiling, which limits their practical adoption.
We present AutoTailor, the first framework to enable automated, end-to-end SuperNet-based adaptive model deployment for edge devices. Unlike manual SuperNet construction, AutoTailor employs a computation graph-guided compilation approach to automatically transform user-provided ML models into SuperNets. To support efficient specialization, AutoTailor incorporates learning-free latency and accuracy predictors, enabling low-cost yet accurate performance prediction. Our extended evaluations demonstrate that AutoTailor reduces the lines of code for SuperNet construction by 11--27$ imes$, decreases hardware-aware profiling costs by at least 11$ imes$, and achieves up to 15.60% absolute accuracy improvement and 60.03% latency reduction compared to state-of-the-art approaches across diverse models and devices.