MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration

๐Ÿ“… 2026-03-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses a critical limitation in existing medical visionโ€“language pretraining approaches, which neglect the varying difficulty levels of concepts and thus violate cognitive learning principles, leading to suboptimal feature representations and limited generalization. To bridge this gap, the study introduces cognitive development theory into medical multimodal pretraining for the first time and proposes a knowledge-driven cognitive synergy mechanism. Specifically, it constructs a two-stage curriculum based on diagnostic sensitivity and intra-class sample representativeness to sequence training data, and designs a self-paced asymmetric contrastive loss to dynamically adjust learning objectives. Extensive experiments across three medical imaging scenarios demonstrate that the proposed method consistently outperforms current baselines on diverse downstream tasks, confirming its effectiveness and robustness.

Technology Category

Application Category

๐Ÿ“ Abstract
Medical vision-language pretraining (VLP) models have recently been investigated for their generalization to diverse downstream tasks. However, current medical VLP methods typically force the model to learn simple and complex concepts simultaneously. This anti-cognitive process leads to suboptimal feature representations, especially under distribution shift. To address this limitation, we propose a Knowledge-driven Cognitive Orchestration for Medical VLP (MedKCO) that involves both the ordering of the pretraining data and the learning objective of vision-language contrast. Specifically, we design a two level curriculum by incorporating diagnostic sensitivity and intra-class sample representativeness for the ordering of the pretraining data. Moreover, considering the inter-class similarity of medical images, we introduce a self-paced asymmetric contrastive loss to dynamically adjust the participation of the pretraining objective. We evaluate the proposed pretraining method on three medical imaging scenarios in multiple vision-language downstream tasks, and compare it with several curriculum learning methods. Extensive experiments show that our method significantly surpasses all baselines. https://github.com/Mr-Talon/MedKCO.
Problem

Research questions and friction points this paper is trying to address.

medical vision-language pretraining
cognitive learning
distribution shift
feature representation
curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

medical vision-language pretraining
curriculum learning
asymmetric contrastive loss
knowledge-driven orchestration
diagnostic sensitivity
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chenran Zhang
School of Computer Science and Engineering, Southeast University, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China
R
Ruiqi Wu
School of Computer Science and Engineering, Southeast University, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China
Tao Zhou
Tao Zhou
Nanjing University of Science and Technology, IIAI, UNC, SJTU
Computer visionmachine learningmedical image analysisAI in Healthcare
Y
Yi Zhou
School of Computer Science and Engineering, Southeast University, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China