🤖 AI Summary
Low automatic speech recognition (ASR) accuracy in industrial-scale CRM systems—stemming from poor generalization of off-the-shelf ASR models to domain-specific terminology and speaker accents—hampers downstream customer intent understanding. To address this, we propose a weakly supervised, domain-adaptive ASR fine-tuning framework tailored for CRM applications. Our method leverages noise-robust weak supervision to drastically reduce reliance on high-quality transcriptions; integrates lightweight acoustic model adaptation with domain-aware speech preprocessing and linguistic adaptation techniques; and enables efficient transfer of generic ASR models to vertical CRM scenarios. Evaluated on real-world CRM voice data, our approach achieves an average 32.7% relative reduction in word error rate (WER). Deployed in production CRM infrastructure, it robustly supports customer intent classification, entity typing, and personalized service generation—demonstrating strong practical deployability and cross-domain generalizability.
📝 Abstract
In the design of customer relationship management (CRM) systems, accurately identifying customer types and offering personalized services are key to enhancing customer satisfaction and loyalty. However, this process faces the challenge of discerning customer voices and intentions, and general pre-trained automatic speech recognition (ASR) models make it difficult to effectively address industry-specific speech recognition tasks. To address this issue, we innovatively proposed a solution for fine-tuning industry-specific ASR models, which significantly improved the performance of the fine-tuned ASR models in industry applications. Experimental results show that our method substantially improves the crucial auxiliary role of the ASR model in industry CRM systems, and this approach has also been adopted in actual industrial applications.