Disparities in Multilingual LLM-Based Healthcare Q&A

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies a significant cross-lingual factual alignment bias in multilingual large language models (LLMs) for medical question answering: mainstream models predominantly rely on English knowledge sources, leading to substantial degradation in factual consistency and knowledge coverage for non-English queries. To address this, we introduce MultiWikiHealthCare—the first multilingual medical evaluation benchmark covering English, German, Turkish, Chinese, and Italian—and propose a retrieval-augmented generation (RAG) framework with target-language context injection, grounded in Wikidata. Our systematic evaluation measures factual alignment between model outputs and multilingual reference sources. Empirically, we demonstrate that injecting target-language contextual knowledge significantly improves factual alignment for non-English languages (average +23.6%), mitigating culture-specific knowledge gaps and English-centric biases. This work establishes a new paradigm for equitable and reliable multilingual medical AI.

Technology Category

Application Category

📝 Abstract
Equitable access to reliable health information is vital when integrating AI into healthcare. Yet, information quality varies across languages, raising concerns about the reliability and consistency of multilingual Large Language Models (LLMs). We systematically examine cross-lingual disparities in pre-training source and factuality alignment in LLM answers for multilingual healthcare Q&A across English, German, Turkish, Chinese (Mandarin), and Italian. We (i) constructed Multilingual Wiki Health Care (MultiWikiHealthCare), a multilingual dataset from Wikipedia; (ii) analyzed cross-lingual healthcare coverage; (iii) assessed LLM response alignment with these references; and (iv) conducted a case study on factual alignment through the use of contextual information and Retrieval-Augmented Generation (RAG). Our findings reveal substantial cross-lingual disparities in both Wikipedia coverage and LLM factual alignment. Across LLMs, responses align more with English Wikipedia, even when the prompts are non-English. Providing contextual excerpts from non-English Wikipedia at inference time effectively shifts factual alignment toward culturally relevant knowledge. These results highlight practical pathways for building more equitable, multilingual AI systems for healthcare.
Problem

Research questions and friction points this paper is trying to address.

Examining cross-lingual disparities in healthcare Q&A reliability
Assessing multilingual LLM factual alignment across five languages
Addressing Wikipedia coverage gaps and cultural relevance in AI responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed multilingual dataset from Wikipedia sources
Analyzed cross-lingual healthcare coverage disparities systematically
Used contextual RAG to improve factual alignment
🔎 Similar Papers
No similar papers found.
Ipek Baris Schlicht
Ipek Baris Schlicht
Universitat Politècnica de València, Deutsche Welle
Applied NLPMisinformation DetectionComputational Journalism
B
Burcu Sayin
University of Trento, Italy
Z
Zhixue Zhao
University of Sheffield, United Kingdom
F
Frederik M. Labonté
Bonn-Aachen International Center for IT, Germany; University of Bonn, Germany; Lamarr Institute for ML and AI, Germany
C
Cesare Barbera
University of Pisa, Italy
Marco Viviani
Marco Viviani
Università degli Studi di Milano-Bicocca / DISCo
Social ComputingNatural Language ProcessingInformation RetrievalMisinformationPrivacy
P
Paolo Rosso
Universitat Politècnica de València, Spain; ValgrAI Valencian Graduate School and Research Network of Artificial Intelligence, Spain
Lucie Flek
Lucie Flek
University of Bonn, Lamarr Institute of Machine Learning and Artificial Intelligence
Natural Language ProcessingMachine LearningPhysicsComputational Social Sciences