HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Medical vision-language models (Med-VLMs) struggle to jointly optimize multimodal understanding and generation capabilities. To address this, we propose HealthGPT—the first end-to-end, multitask-unified medical multimodal large language model. Methodologically, we introduce Heterogeneous Low-Rank Adaptation (H-LoRA) to tightly couple a hierarchical vision encoder with an autoregressive language decoder, and design a three-stage collaborative training strategy to enable cross-modal semantic alignment and joint transfer of understanding-generation knowledge. HealthGPT is built upon open-source large language models and trained end-to-end on VL-Health, our high-quality, self-curated medical multimodal dataset. It achieves state-of-the-art performance across unified benchmarks—including medical image captioning, visual question answering, and radiology report generation—demonstrating substantial improvements in generalization and scalability. The code and model weights are publicly released.

Technology Category

Application Category

📝 Abstract

We present HealthGPT, a powerful Medical Large Vision-Language Model (Med-LVLM) that integrates medical visual comprehension and generation capabilities within a unified autoregressive paradigm. Our bootstrapping philosophy is to progressively adapt heterogeneous comprehension and generation knowledge to pre-trained large language models (LLMs). This is achieved through a novel heterogeneous low-rank adaptation (H-LoRA) technique, which is complemented by a tailored hierarchical visual perception approach and a three-stage learning strategy. To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health. Experimental results demonstrate exceptional performance and scalability of HealthGPT in medical visual unified tasks. Our project can be accessed at https://github.com/DCDmllm/HealthGPT.

Problem

Research questions and friction points this paper is trying to address.

Integrates medical visual comprehension and generation

Adapts heterogeneous knowledge to pre-trained LLMs

Develops a medical domain-specific dataset VL-Health

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous low-rank adaptation

Hierarchical visual perception

Three-stage learning strategy

🔎 Similar Papers

No similar papers found.