🤖 AI Summary
Prior work lacks systematic comparison of large language models (LLMs) versus conventional machine translation (MT) systems for medical consultation summarization across morphologically diverse languages—specifically English-to-Arabic, Chinese, and Vietnamese—with explicit distinction between patient-friendly and clinician-oriented texts.
Method: We evaluated state-of-the-art open-source and commercial LLMs (e.g., Llama, Qwen) alongside statistical and neural MT systems (e.g., Google Translate, DeepL, OpenNMT) using standard automatic metrics (BLEU, METEOR) on domain-specific medical summaries.
Contribution/Results: Conventional MT outperformed LLMs overall—especially on morphologically complex, terminology-dense clinical text—though LLMs approached MT quality on simplified Vietnamese and Chinese summaries and unexpectedly surpassed baselines in Arabic. Critically, all automated methods failed to ensure clinical accuracy. The study reveals nonlinear interactions between linguistic morphology, text type, and translation performance, and exposes fundamental limitations of generic evaluation metrics in capturing clinical relevance—underscoring the necessity of human expert validation and domain-specific fine-tuning.
📝 Abstract
This study evaluates how well large language models (LLMs) and traditional machine translation (MT) tools translate medical consultation summaries from English into Arabic, Chinese, and Vietnamese. It assesses both patient, friendly and clinician, focused texts using standard automated metrics. Results showed that traditional MT tools generally performed better, especially for complex texts, while LLMs showed promise, particularly in Vietnamese and Chinese, when translating simpler summaries. Arabic translations improved with complexity due to the language's morphology. Overall, while LLMs offer contextual flexibility, they remain inconsistent, and current evaluation metrics fail to capture clinical relevance. The study highlights the need for domain-specific training, improved evaluation methods, and human oversight in medical translation.