Forging a Dynamic Memory: Retrieval-Guided Continual Learning for Generalist Medical Foundation Models

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the significant inter-modal domain gaps and rapid forgetting of intra-modal fine-grained features in continual learning (CL) for medical multimodal large language models (MLLMs), this paper proposes the first retrieval-augmented continual learning framework tailored to medical applications. Methodologically, it innovatively integrates Retrieval-Augmented Generation (RAG) into CL by constructing a multi-layer dynamic RAG memory grounded in 18 million biomedical literature entries; designs a dynamic knowledge distillation framework that adaptively modulates parameter importance, knowledge granularity, and reference distribution; and establishes MGTIL—the first general-purpose, task-incremental benchmark for medical multimodal learning. Evaluated on MGTIL, our approach consistently surpasses state-of-the-art methods across all metrics, achieving new SOTA performance. It notably enhances cross-modal domain adaptability, preserves fine-grained semantic representations, and improves real-time learning efficiency for novel tasks.

Technology Category

Application Category

📝 Abstract

Multimodal biomedical Vision-Language Models (VLMs) exhibit immense potential in the field of Continual Learning (CL). However, they confront a core dilemma: how to preserve fine-grained intra-modality features while bridging the significant domain gap across different modalities. To address this challenge, we propose a comprehensive framework. Leveraging our 18-million multimodal and comprehensive medical retrieval database derived from PubMed scientific papers, we pioneer the integration of Retrieval-Augmented Generation (RAG) into CL. Specifically, we employ a multi-modal, multi-layer RAG system that provides real-time guidance for model fine-tuning through dynamic, on-demand knowledge retrieval. Building upon this, we introduce a dynamic knowledge distillation framework. This framework precisely resolves the aforementioned core dilemma by dynamically modulating the importance of the parameter space, the granularity of the distilled knowledge, and the data distribution of the reference dataset in accordance with the required level of detail. To thoroughly validate the clinical value of our strategy, we have designed a more rigorous extbf{M}edical Generalist Task Incremental Learning (MGTIL) benchmark. This benchmark is engineered to simultaneously evaluate the model's capacity for adaptation to significant domain shifts, retention of subtle intra-domain features, and real-time learning of novel and complex medical tasks. Extensive experimental results demonstrate that our proposed method achieves state-of-the-art (SOTA) performance across all metrics. The code is provided in the supplementary materials.

Problem

Research questions and friction points this paper is trying to address.

Preserving fine-grained intra-modality features while bridging domain gaps across modalities

Integrating Retrieval-Augmented Generation into Continual Learning for medical models

Dynamically modulating knowledge distillation to handle detailed medical task adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates retrieval-augmented generation for continual learning

Uses dynamic knowledge distillation to balance feature preservation

Introduces a medical task benchmark for rigorous evaluation

🔎 Similar Papers

Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning