Marco-ASR: A Principled and Metric-Driven Framework for Fine-Tuning Large-Scale ASR Models for Domain Adaptation

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address performance degradation in ASR models—particularly LLM-based ASR—under domain adaptation due to data mismatch and training complexity, this paper proposes a metric-driven fine-tuning framework. Methodologically, it introduces the first learning-rate scheduler adaptively calibrated via WER feedback, integrated with domain-aware data augmentation, multi-scale temporal transformations, and an anti-overfitting fine-tuning protocol. The framework unifies support for both conventional ASR and LLM-based ASR (e.g., Whisper, Qwen2-Audio) across architectures. Experiments on diverse multi-domain, multilingual, and variable-length benchmarks demonstrate that Whisper-Turbo achieves an average 23.6% relative WER reduction, while Qwen2-Audio exhibits markedly improved generalization and training stability. The core contributions are a metric-guided dynamic optimization paradigm and a generic, architecture-agnostic domain adaptation framework.

Technology Category

Application Category

📝 Abstract

Automatic Speech Recognition (ASR) models have achieved remarkable accuracy in general settings, yet their performance often degrades in domain-specific applications due to data mismatch and linguistic variability. This challenge is amplified for modern Large Language Model (LLM)-based ASR systems, whose massive scale and complex training dynamics make effective fine-tuning non-trivial. To address this gap, this paper proposes a principled and metric-driven fine-tuning framework for adapting both traditional and LLM-based ASR models to specialized domains. The framework emphasizes learning rate optimization based on performance metrics, combined with domain-specific data transformation and augmentation. We empirically evaluate our framework on state-of-the-art models, including Whisper, Whisper-Turbo, and Qwen2-Audio, across multi-domain, multilingual, and multi-length datasets. Our results not only validate the proposed framework but also establish practical protocols for improving domain-specific ASR performance while preventing overfitting.

Problem

Research questions and friction points this paper is trying to address.

Adapting ASR models to specialized domains effectively

Optimizing fine-tuning for large-scale LLM-based ASR systems

Preventing overfitting while improving domain-specific performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Metric-driven fine-tuning framework for domain adaptation

Learning rate optimization based on performance metrics

Domain-specific data transformation and augmentation techniques

🔎 Similar Papers

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models