Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Zeroth-order (ZO) optimization suffers from high computational overhead due to Gaussian random number generation, hindering its deployment on resource-constrained edge hardware such as FPGAs and ASICs. To address this algorithm–hardware mismatch, we propose PeZO—the first perturbation-efficient ZO framework. Our approach comprises three key innovations: (1) a random-number reuse mechanism that drastically reduces sampling frequency; (2) a hardware-friendly adaptive uniform scaling technique that approximates Gaussian perturbations with minimal overhead; and (3) co-optimization of random-number generation logic and the training pipeline. Implemented on FPGA, PeZO reduces LUT utilization by 48.6%, flip-flop (FF) usage by 12.7%, and peak power consumption by 86%, while preserving end-to-end training accuracy. This work establishes a deployable, energy-efficient hardware–algorithm co-design paradigm for ZO optimization on edge devices.

Technology Category

Application Category

📝 Abstract
Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms, such as FPGAs and ASICs. In this paper, we identify this critical issue, which arises from the mismatch between algorithm and hardware designers. To address this issue, we proposed PeZO, a perturbation-efficient ZO framework. Specifically, we design random number reuse strategies to significantly reduce the demand for random number generation and introduce a hardware-friendly adaptive scaling method to replace the costly Gaussian distribution with a uniform distribution. Our experiments show that PeZO reduces the required LUTs and FFs for random number generation by 48.6% and 12.7%, and saves at maximum 86% power consumption, all without compromising training performance, making ZO optimization feasible for on-device training. To the best of our knowledge, we are the first to explore the potential of on-device ZO optimization, providing valuable insights for future research.
Problem

Research questions and friction points this paper is trying to address.

Reducing Gaussian random number generation in ZO optimization
Making ZO optimization feasible for FPGAs and ASICs
Minimizing hardware resource usage without performance loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Random number reuse reduces Gaussian generation demand
Hardware-friendly uniform distribution replaces costly Gaussian
PeZO framework enables efficient on-device ZO training
🔎 Similar Papers
No similar papers found.