OnDA: On-device Channel Pruning for Efficient Personalized Keyword Spotting

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the distribution shift in keyword spotting (KWS) tasks caused by variations across users and environments by proposing an on-device joint optimization framework that uniquely integrates weight adaptation with online structured channel pruning. This approach enables personalized model compression and efficient inference by combining data-agnostic and data-aware pruning criteria within a self-learning pipeline tailored to individual users. The system supports online training and inference directly on embedded GPUs. Evaluated on the HeySnips and HeySnapdragon datasets, the method achieves a model compression ratio of 9.63×. On the Jetson Orin Nano platform, it reduces training and inference latency by 1.52× and 1.64×, respectively, while decreasing energy consumption by 1.57× and 1.77×.

Technology Category

Application Category

📝 Abstract
Always-on keyword spotting (KWS) demands on-device adaptation to cope with user- and environment-specific distribution shifts under tight latency and energy budgets. This paper proposes, for the first time, coupling weight adaptation (i.e., on-device training) with architectural adaptation, in the form of online structured channel pruning, for personalized on-device KWS. Starting from a state-of-the-art self-learning personalized KWS pipeline, we compare data-agnostic and data-aware pruning criteria applied on in-field pseudo-labelled user data. On the HeySnips and HeySnapdragon datasets, we achieve up to 9.63x model-size compression with respect to unpruned baselines at iso-task performance, measured as the accuracy at 0.5 false alarms per hour. When deploying our adaptation pipeline on a Jetson Orin Nano embedded GPU, we achieve up to 1.52x/1.57x and 1.64x/1.77x latency and energy-consumption improvements during online training/inference compared to weights-only adaptation.
Problem

Research questions and friction points this paper is trying to address.

keyword spotting
on-device adaptation
distribution shift
latency constraint
energy efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

on-device adaptation
channel pruning
personalized keyword spotting
online structured pruning
energy-efficient inference
🔎 Similar Papers
No similar papers found.