RockNet: Distributed Learning on Ultra-Low-Power Devices

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

To address privacy leakage, high latency, and severe resource constraints in distributed machine learning training on ultra-low-power microcontroller units (MCUs), this paper proposes the first distributed framework enabling pretraining-free, end-to-end training of efficient neural networks on MCU clusters. The framework integrates a lightweight distributed optimization algorithm with a customized wireless multi-hop communication protocol to enable collaborative parallel training. Key innovations include gradient sparsification, local computation offloading, and communication-computation overlap—collectively minimizing per-node memory footprint, energy consumption, and latency. Evaluated on a 20-node real hardware testbed, our approach achieves a 2× improvement in classification accuracy over state-of-the-art methods, while reducing memory usage, end-to-end latency, and energy consumption by up to 90%.

Technology Category

Application Category

📝 Abstract

As Machine Learning (ML) becomes integral to Cyber-Physical Systems (CPS), there is growing interest in shifting training from traditional cloud-based to on-device processing (TinyML), for example, due to privacy and latency concerns. However, CPS often comprise ultra-low-power microcontrollers, whose limited compute resources make training challenging. This paper presents RockNet, a new TinyML method tailored for ultra-low-power hardware that achieves state-of-the-art accuracy in timeseries classification, such as fault or malware detection, without requiring offline pretraining. By leveraging that CPS consist of multiple devices, we design a distributed learning method that integrates ML and wireless communication. RockNet leverages all devices for distributed training of specialized compute efficient classifiers that need minimal communication overhead for parallelization. Combined with tailored and efficient wireless multi-hop communication protocols, our approach overcomes the communication bottleneck that often occurs in distributed learning. Hardware experiments on a testbed with 20 ultra-low-power devices demonstrate RockNet's effectiveness. It successfully learns timeseries classification tasks from scratch, surpassing the accuracy of the latest approach for neural network microcontroller training by up to 2x. RockNet's distributed ML architecture reduces memory, latency and energy consumption per device by up to 90 % when scaling from one central device to 20 devices. Our results show that a tight integration of distributed ML, distributed computing, and communication enables, for the first time, training on ultra-low-power hardware with state-of-the-art accuracy.

Problem

Research questions and friction points this paper is trying to address.

Enabling distributed machine learning on ultra-low-power microcontroller devices

Overcoming communication bottlenecks in distributed learning for CPS

Achieving state-of-the-art timeseries classification accuracy without pretraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed learning method integrates ML with wireless communication

Specialized efficient classifiers minimize communication overhead for parallelization

Tailored wireless protocols overcome distributed learning communication bottlenecks

🔎 Similar Papers

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

2024-08-05arXiv.orgCitations: 0

Revisiting DNN Training for Intermittently Powered Energy Harvesting Micro Computers

2024-08-25arXiv.orgCitations: 0

Apple

Seattle, United States of America

Master Thesis Data-Efficient Hybrid Machine Learning for Robust Vibration System Prediction

Bosch Group

Renningen, BW, DE

Software Engineer, Machine Learning