MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multilingual medical speech translation has long suffered from data scarcity and a lack of systematic investigation, hindering cross-lingual clinical communication. To address this, we introduce MultiMed-ST—the first large-scale, multilingual medical speech translation dataset—covering Simplified/Traditional Chinese, English, German, French, and Vietnamese, with 290K annotated samples enabling all-to-all translation. We conduct the first comprehensive empirical study on many-to-many medical speech translation, establishing a unified evaluation framework spanning bilingual/multilingual, end-to-end/cascaded, and single-task/multi-task settings. We propose multilingual joint training with code-switching modeling, complemented by quantitative and qualitative error analysis. All resources—including the dataset, models, and code—are publicly released. Our work validates the feasibility of multilingual medical speech translation and identifies critical performance determinants: language-pair characteristics, architectural choices, and task design.

Technology Category

Application Category

📝 Abstract
Multilingual speech translation (ST) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, Traditional Chinese and Simplified Chinese, together with the models. With 290,000 samples, our dataset is the largest medical machine translation (MT) dataset and the largest many-to-many multilingual ST among all domains. Secondly, we present the most extensive analysis study in ST research to date, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, task-specific vs. multi-task sequence-to-sequence (seq2seq) comparative study, code-switch analysis, and quantitative-qualitative error analysis. All code, data, and models are available online: https://github.com/leduckhai/MultiMed-ST.
Problem

Research questions and friction points this paper is trying to address.

Enables multilingual medical speech translation for patient care
Addresses specialized workforce shortages in healthcare communication
Facilitates improved diagnosis and treatment across languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multilingual medical speech translation dataset
Comprehensive analysis including empirical baselines and comparative studies
Open-source release of code, data, and models
🔎 Similar Papers
No similar papers found.
Khai Le-Duc
Khai Le-Duc
University of Toronto
Artificial IntelligenceHeal the world
Tuyen Tran
Tuyen Tran
Deakin University
B
Bach Phan Tat
KU Leuven, Belgium
Nguyen Kim Hai Bui
Nguyen Kim Hai Bui
Eötvös Loránd University
AIMLNLPTimeseries
Q
Quan Dang
Hanoi University of Science and Technology, Vietnam
H
Hung-Phong Tran
Hanoi University of Science and Technology, Vietnam
T
Thanh-Thuy Nguyen
HCMC Open University, Vietnam
L
Ly Nguyen
IÉSEG School of Management, France
T
Tuan-Minh Phan
Technische Universit ¨at Dortmund, Germany
T
Thi Thu Phuong Tran
University of Hertfordshire, United Kingdom
Chris Ngo
Chris Ngo
Knovel Engineering
N
Nguyen X. Khanh
UC Berkeley, United States
Thanh Nguyen-Tang
Thanh Nguyen-Tang
Johns Hopkins University
Machine Learning