🤖 AI Summary
Power consumption optimization for embedded sensors in TinyML applications remains challenging due to stringent energy constraints on resource-constrained edge devices.
Method: This work presents the first co-deployment of a depth-first convolutional neural network (CNN) on the Intelligent Sensor Processing Unit (ISPU) of an STMicroelectronics inertial measurement unit (IMU). We propose three key innovations: (1) an end-to-edge CNN deployment framework optimized specifically for IMU-level ISPU hardware; (2) a dynamic early-exit mechanism to eliminate redundant inference computations; and (3) a hierarchical, heterogeneous ISPU–MCU collaborative execution architecture.
Results: Evaluated on the STM32 NUCLEO-F411RE platform, our approach reduces average operating current to 4.8 mA—yielding an 11% power saving over MCU-only inference—while preserving full classification accuracy. This work establishes a scalable hardware–algorithm co-design paradigm for ultra-low-power sensing and edge intelligence.
📝 Abstract
Tiny Machine Learning (TinyML) is a novel research field aiming at integrating Machine Learning (ML) within embedded devices with limited memory, computation, and energy. Recently, a new branch of TinyML has emerged, focusing on integrating ML directly into the sensors to further reduce the power consumption of embedded devices. Interestingly, despite their state-of-the-art performance in many tasks, none of the current solutions in the literature aims to optimize the implementation of Convolutional Neural Networks (CNNs) operating directly into sensors. In this paper, we introduce for the first time in the literature the optimized design and implementation of Depth-First CNNs operating on the Intelligent Sensor Processing Unit (ISPU) within an Inertial Measurement Unit (IMU) by STMicroelectronics. Our approach partitions the CNN between the ISPU and the microcontroller (MCU) and employs an Early-Exit mechanism to stop the computations on the IMU when enough confidence about the results is achieved, hence significantly reducing power consumption. When using a NUCLEO-F411RE board, this solution achieved an average current consumption of 4.8 mA, marking an 11% reduction compared to the regular inference pipeline on the MCU, while having equal accuracy.