BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 6
Influential: 2
📄 PDF
🤖 AI Summary
To address performance bottlenecks of low-resource languages—particularly Arabic—in multimodal medical AI, this paper introduces BiMediX2, the first large multimodal model supporting bilingual (Arabic–English) understanding of medical text and imagery. Methodologically, BiMediX2 is built upon the Llama3.1 architecture, augmented with a vision encoder and fine-tuned via bilingual instruction tuning on 1.6 million Arabic–English medical image–text pairs. We further propose BiMed-MBench, the first bilingual multimodal medical benchmark, enabling cross-lingual collaborative modeling. Experimental results demonstrate state-of-the-art performance across medical visual question answering, multiround dialogue, and radiology report generation: Arabic-language accuracy improves by over 20%, English by >9%, and factual correctness on UPHILL exceeds GPT-4 by ~9%. BiMediX2 thus establishes a new technical frontier for bilingual multimodal interaction in clinical AI.

Technology Category

Application Category

📝 Abstract
This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMediX2 leverages the Llama3.1 architecture and integrates text and visual capabilities to facilitate seamless interactions in both English and Arabic, supporting text-based inputs and multi-turn conversations involving medical images. The model is trained on an extensive bilingual healthcare dataset consisting of 1.6M samples of diverse medical interactions for both text and image modalities, mixed in Arabic and English. We also propose the first bilingual GPT-4o based medical LMM benchmark named BiMed-MBench. BiMediX2 is benchmarked on both text-based and image-based tasks, achieving state-of-the-art performance across several medical benchmarks. It outperforms recent state-of-the-art models in medical LLM evaluation benchmarks. Our model also sets a new benchmark in multimodal medical evaluations with over 9% improvement in English and over 20% in Arabic evaluations. Additionally, it surpasses GPT-4 by around 9% in UPHILL factual accuracy evaluations and excels in various medical Visual Question Answering, Report Generation, and Report Summarization tasks. The project page including source code and the trained model, is available at https://github.com/mbzuai-oryx/BiMediX2.
Problem

Research questions and friction points this paper is trying to address.

Develops bilingual medical AI for text and image interactions
Creates comprehensive dataset for medical LLM and LMM tasks
Establishes first Arabic-English medical LMM evaluation benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilingual medical model for text and images
Trained on 1.6M bilingual healthcare dataset samples
Achieves state-of-the-art multilingual medical benchmark performance
🔎 Similar Papers
No similar papers found.
Sahal Shaji Mullappilly
Sahal Shaji Mullappilly
PhD Computer Vision Student, MBZUAI
Vision Language ModelsComputer VisionObject DetectionReal-time models
M
Mohammed Irfan Kurpath
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Sara Pieri
Sara Pieri
PhD Student, Inria, École Normale Supérieure
Vision Language ModelsComputer Vision
S
Saeed Yahya Alseiari
Sheikh Shakhbout Medical City (SSMC)
S
Shanavas Cholakkal
Govt Medical College Kozhikode
K
Khaled Aldahmani
Shaikh Tahnoon bin Mohammed Medical City (STMC), Tawam Hospital
F
F. Khan
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Linköping University
R
R. Anwer
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
S
Salman H. Khan
Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)
Timothy Baldwin
Timothy Baldwin
MBZUAI and The University of Melbourne
computational linguisticsnatural language processingartificial intelligence
Hisham Cholakkal
Hisham Cholakkal
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Computer VisionLarge Multimodal ModelsLLMHealthcare Foundation ModelConversational Assistant