ViMedCSS: A Vietnamese Medical Code-Switching Speech Dataset&Benchmark

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF

Technology Category

Application Category

📝 Abstract
Code-switching (CS), which is when Vietnamese speech uses English words like drug names or procedures, is a common phenomenon in Vietnamese medical communication. This creates challenges for Automatic Speech Recognition (ASR) systems, especially in low-resource languages like Vietnamese. Current most ASR systems struggle to recognize correctly English medical terms within Vietnamese sentences, and no benchmark addresses this challenge. In this paper, we construct a 34-hour \textbf{Vi}etnamese \textbf{Med}ical \textbf{C}ode-\textbf{S}witching \textbf{S}peech dataset (ViMedCSS) containing 16,576 utterances. Each utterance includes at least one English medical term drawn from a curated bilingual lexicon covering five medical topics. Using this dataset, we evaluate several state-of-the-art ASR models and examine different specific fine-tuning strategies for improving medical term recognition to investigate the best approach to solve in the dataset. Experimental results show that Vietnamese-optimized models perform better on general segments, while multilingual pretraining helps capture English insertions. The combination of both approaches yields the best balance between overall and code-switched accuracy. This work provides the first benchmark for Vietnamese medical code-switching and offers insights into effective domain adaptation for low-resource, multilingual ASR systems.
Problem

Research questions and friction points this paper is trying to address.

code-switching
Automatic Speech Recognition
Vietnamese
medical terminology
low-resource languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

code-switching
Vietnamese medical ASR
low-resource ASR
multilingual pretraining
domain adaptation
🔎 Similar Papers
T
Tung X. Nguyen
College of Engineering and Computer Science, VinUniversity, Vietnam; Center for AI Research, VinUniversity, Vietnam
N
Nhu Vo
College of Engineering and Computer Science, VinUniversity, Vietnam; University of Technology Sydney, Australia
G
Giang-Son Nguyen
College of Engineering and Computer Science, VinUniversity, Vietnam; Center for AI Research, VinUniversity, Vietnam
D
Duy Mai Hoang
College of Health Sciences, VinUniversity, Vietnam
C
Chien Dinh Huynh
College of Health Sciences, VinUniversity, Vietnam
I
Inigo Jauregi Unanue
University of Technology Sydney, Australia
Massimo Piccardi
Massimo Piccardi
Professor, University of Technology Sydney
natural language processingcomputer visionpattern recognition
Wray Buntine
Wray Buntine
Professor, VinUniversity
Machine Learning
D
Dung D. Le
College of Engineering and Computer Science, VinUniversity, Vietnam; Center for AI Research, VinUniversity, Vietnam