🤖 AI Summary
Low-resource automatic speech recognition (ASR) suffers from data scarcity and poor cross-lingual generalization; existing adaptation methods often rely on external resources, incur high computational costs, or lack support for test-time adaptation. To address these limitations, this work introduces Meta-In-Context Learning (Meta-ICL) to ASR for the first time, proposing a lightweight, fine-tuning-free adaptation framework. It leverages k-nearest neighbor (k-NN) retrieval to dynamically identify semantically similar speech samples and construct task-aware in-context examples, enabling zero-shot or few-shot cross-lingual transfer for Whisper. Evaluated on the ML-SUPERB multilingual benchmark, our approach substantially reduces character error rates (CER) for low-resource languages, achieving an average improvement of 32.7%. Results demonstrate the method’s effectiveness, efficiency, and scalability—without requiring parameter updates or external linguistic resources.
📝 Abstract
This paper presents Meta-Whisper, a novel approach to improve automatic speech recognition (ASR) for low-resource languages using the Whisper model. By leveraging Meta In-Context Learning (Meta-ICL) and a k-Nearest Neighbors (KNN) algorithm for sample selection, Meta-Whisper enhances Whisper's ability to recognize speech in unfamiliar languages without extensive fine-tuning. Experiments on the ML-SUPERB dataset show that Meta-Whisper significantly reduces the Character Error Rate (CER) for low-resource languages compared to the original Whisper model. This method offers a promising solution for developing more adaptable multilingual ASR systems, particularly for languages with limited resources.