Multi-Aspect Knowledge-Enhanced Medical Vision-Language Pretraining with Multi-Agent Data Generation

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical vision-language pretraining methods struggle with noisy web-scale data and unstructured, lengthy clinical notes. To address these challenges, we propose an ontology-guided multi-agent collaborative pretraining framework. First, we construct a multi-agent system based on foundation models to autonomously generate high-quality, fine-grained skin image descriptions, validated via retrieval for semantic fidelity. Second, we design an ontology-guided attention mechanism coupled with multi-level contrastive learning to explicitly model semantic relationships among medical concepts, enabling holistic–local cross-modal alignment. Third, we incorporate knowledge distillation to enhance generalization. Evaluated on eight dermatological datasets, our method achieves state-of-the-art performance in zero-shot disease classification and cross-modal retrieval. Furthermore, we publicly release Derm1M-AgentAug—a large-scale, high-quality augmented dataset comprising 400K image–text pairs—facilitating future research in medical vision-language understanding.

Technology Category

Application Category

📝 Abstract
Vision-language pretraining (VLP) has emerged as a powerful paradigm in medical image analysis, enabling representation learning from large-scale image-text pairs without relying on expensive manual annotations. However, existing methods often struggle with the noise inherent in web-collected data and the complexity of unstructured long medical texts. To address these challenges, we propose a novel VLP framework integrating a Multi-Agent data GENeration (MAGEN) system and Ontology-based Multi-Aspect Knowledge-Enhanced (O-MAKE) pretraining. First, MAGEN enhances data quality by synthesizing knowledge-enriched descriptions via a foundation model-assisted captioning and retrieval-based verification pipeline. Second, O-MAKE addresses the difficulty of learning from long, unstructured texts by decomposing them into distinct knowledge aspects. This facilitates fine-grained alignment at both global and patch levels, while explicitly modeling medical concept relationships through ontology-guided mechanisms. We validate our framework in the field of dermatology, where comprehensive experiments demonstrate the effectiveness of each component. Our approach achieves state-of-the-art zero-shot performance on disease classification and cross-modal retrieval tasks across eight datasets. Our code and the augmented dataset Derm1M-AgentAug, comprising over 400k skin-image-text pairs, will be released at https://github.com/SiyuanYan1/Derm1M.
Problem

Research questions and friction points this paper is trying to address.

Addresses noise in web-collected medical image-text data
Handles complexity of unstructured long medical texts
Enhances fine-grained alignment and medical concept modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent data generation for knowledge-enriched descriptions
Ontology-based multi-aspect knowledge-enhanced pretraining
Fine-grained alignment at global and patch levels
🔎 Similar Papers
No similar papers found.
X
Xieji Li
Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
Siyuan Yan
Siyuan Yan
Research Fellow@Monash University
AI for MedicineFoundation Model
Y
Yingsheng Liu
Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
H
H. Soyer
Frazer Institute, The University of Queensland, Dermatology Research Centre, Brisbane, Queensland, Australia, QLD 4072, Australia
Monika Janda
Monika Janda
The University of Queensland and Queensland University of Technology
Cancer researchbehavioral research
Victoria Mar
Victoria Mar
Victorian Melanoma Service, Alfred Health, Melbourne, VIC 3004, Australia
Z
Z. Ge
Department of Data Science and AI, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia