XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models

📅 2023-06-13
🏛️ Workshop on Biomedical Natural Language Processing
📈 Citations: 50
Influential: 5
📄 PDF
🤖 AI Summary
To address the limited domain adaptability of large vision-language models (VLMs) in chest X-ray interpretation, this work introduces the first conversational, radiology-specific medical VLM. Methodologically: (1) we jointly optimize a dedicated medical vision encoder and a fine-tuned large language model via cross-modal alignment—marking the first such integration for radiological imaging; (2) we construct and publicly release a high-quality, open-source dataset comprising 217,000 chest X-ray–report interaction samples; and (3) we unify radiology report generation with instruction tuning to support both structured summarization and open-ended question answering. Evaluated by board-certified radiologists in a blinded assessment, our model achieves >70% scientific accuracy and an average rating of 4.0/5.0—significantly outperforming general-purpose VLM baselines. This work establishes a new foundation for clinically grounded, interactive medical VLMs tailored to radiological practice.
📝 Abstract
The latest breakthroughs in large language models (LLMs) and vision-language models (VLMs) have showcased promising capabilities toward performing a wide range of tasks. Such models are typically trained on massive datasets comprising billions of image-text pairs with diverse tasks. However, their performance on task-specific domains, such as radiology, is still under-explored. While few works have recently explored LLMs-based conversational medical models, they mainly focus on text-based analysis. In this paper, we introduce XrayGPT, a conversational medical vision-language (VLMs) model that can analyze and answer open-ended questions about chest radiographs. Specifically, we align both medical visual encoder with a fine-tuned LLM to possess visual conversation abilities, grounded in an understanding of radiographs and medical knowledge. For improved alignment of chest radiograph data, we generate ~217k interactive and high-quality summaries from free-text radiology reports. Extensive experiments are conducted to validate the merits of XrayGPT. To conduct an expert evaluation, certified medical doctors evaluated the output of our XrayGPT on a test subset and the results reveal that more than 70% of the responses are scientifically accurate, with an average score of 4/5. We hope our simple and effective method establishes a solid baseline, facilitating future research toward automated analysis and summarization of chest radiographs. Code, models, and instruction sets will be publicly released.
Problem

Research questions and friction points this paper is trying to address.

Enhancing radiology-specific vision-language model performance
Enabling open-ended question answering for chest radiographs
Generating high-quality summaries to improve medical LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns MedClip with Vicuna via linear transformation
Generates 217k summaries from radiology reports
Enhances LLMs for medical visual conversation
🔎 Similar Papers
No similar papers found.