ValSub: Subsampling Validation Data to Mitigate Forgetting during ASR Personalization

📅 2025-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ASR models suffer from catastrophic forgetting during on-device personalized fine-tuning, degrading source-domain generalization; conventional full-validation-set-based forgetting assessment is infeasible on resource-constrained edge devices due to prohibitive storage and computational overhead. This paper proposes a lightweight forgetting monitoring framework based on validation set sub-sampling: it jointly integrates distribution-matching-driven sub-sampling with dynamic forgetting quantification to construct a compact yet high-fidelity surrogate validation set, and further designs an adaptive early-stopping strategy to optimize fine-tuning epochs. Experiments demonstrate that, compared to same-size random subsets, our method reduces the mean absolute error of forgetting estimation by 10.3%–60.7%; moreover, across multiple forgetting thresholds, it consistently approximates the behavior of a 50× larger oracle (full) validation set.

Technology Category

Application Category

📝 Abstract
Automatic Speech Recognition (ASR) is widely used within consumer devices such as mobile phones. Recently, personalization or on-device model fine-tuning has shown that adaptation of ASR models towards target user speech improves their performance over rare words or accented speech. Despite these gains, fine-tuning on user data (target domain) risks the personalized model to forget knowledge about its original training distribution (source domain) i.e. catastrophic forgetting, leading to subpar general ASR performance. A simple and efficient approach to combat catastrophic forgetting is to measure forgetting via a validation set that represents the source domain distribution. However, such validation sets are large and impractical for mobile devices. Towards this, we propose a novel method to subsample a substantially large validation set into a smaller one while maintaining the ability to estimate forgetting. We demonstrate the efficacy of such a dataset in mitigating forgetting by utilizing it to dynamically determine the number of ideal fine-tuning epochs. When measuring the deviations in per user fine-tuning epochs against a 50x larger validation set (oracle), our method achieves a lower mean-absolute-error (3.39) compared to randomly selected subsets of the same size (3.78-8.65). Unlike random baselines, our method consistently tracks the oracle's behaviour across three different forgetting thresholds.
Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in ASR personalization
Subsamples validation data for mobile device efficiency
Dynamically determines ideal fine-tuning epochs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Subsampling validation set to reduce size
Dynamic fine-tuning epochs determination
Maintains source domain distribution representation
🔎 Similar Papers
No similar papers found.
H
Haaris Mehmood
Samsung R&D Institute UK (SRUK)
K
Karthikeyan Saravanan
Samsung R&D Institute UK (SRUK)
Pablo Peso Parada
Pablo Peso Parada
AI Researcher - Samsung Research UK
signal processingmachine learningopen source hardwareaudiospeech
D
David Tuckey
Samsung R&D Institute UK (SRUK)
M
Mete Ozay
Samsung R&D Institute UK (SRUK)
Gil Ho Lee
Gil Ho Lee
Samsung Electronics, South Korea
J
Jungin Lee
Samsung Electronics, South Korea
S
Seokyeong Jung
Samsung Electronics, South Korea