🤖 AI Summary
Existing approaches rely on coarse-grained metrics such as MACs to predict energy consumption and latency of deep learning models on microcontrollers (MCUs), resulting in low accuracy and high measurement overhead. This work proposes InstMeter, the first linear predictor built upon instruction-level clock cycles, enabling highly accurate and strongly linear estimation of both energy and latency. InstMeter is compatible with mainstream MCU architectures, including ARM Cortex-M and RISC-V, and dramatically reduces the required training data—by 100× for energy and 10× for latency—while lowering prediction errors by 3× and 6.5×, respectively. Furthermore, it significantly improves model accuracy in neural architecture search (NAS) by providing more reliable hardware-aware feedback.
📝 Abstract
Deep learning (DL) models can now run on microcontrollers (MCUs). Through neural architecture search (NAS), we can search DL models that meet the constraints of MCUs. Among various constraints, energy and latency costs of the model inference are critical metrics. To predict them, existing research relies on coarse proxies such as multiply-accumulations (MACs) and model's input parameters, often resulting in inaccurate predictions or requiring extensive data collection. In this paper, we propose InstMeter, a predictor leveraging MCUs' clock cycles to accurately estimate the energy and latency of DL models. Clock cycles are fundamental metrics reflecting MCU operations, directly determining energy and latency costs. Furthermore, a unique property of our predictor is its strong linearity, allowing it to be simple and accurate. We thoroughly evaluate InstMeter under different scenarios, MCUs, and software settings. Compared with state-of-the-art studies, InstMeter can reduce the energy and latency prediction errors by $3\times$ and $6.5\times$, respectively, while requiring $100\times$ and $10\times$ less training data. In the NAS scenario, InstMeter can fully exploit the energy budget, identifying optimal DL models with higher inference accuracy. We also evaluate InstMeter's generalization performance through various experiments on three ARM MCUs (Cortex-M4, M7, M33) and one RISC-V-based MCU (ESP32-C3), different compilation options (-Os, -O2), GCC versions (v7.3, v10.3), application scenarios (keyword spotting, image recognition), dynamic voltage and frequency scaling, temperatures (21°C, 43°C), and software settings (TFLMv2.4, TFLMvCI). We will open our source codes and the MCU-specific benchmark datasets.