Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing continual learning methods for medical imaging rely on simplistic class-name templates, failing to capture fine-grained disease semantics. To address this, we propose a cross-modal continual learning framework that integrates large language model (LLM)-generated visual concepts. Our method dynamically constructs a visual concept pool from LLM outputs and filters it via semantic similarity; introduces an image–concept cross-modal attention module; and incorporates an attention-based loss to enhance class-discriminative feature learning. The framework preserves performance on previously learned classes while significantly improving recognition accuracy for novel classes. Evaluated on multiple medical and natural-image continual learning benchmarks, it achieves state-of-the-art results—yielding an average classification accuracy improvement of 3.2% over prior methods. Comprehensive ablation studies and cross-domain experiments validate its generalizability and adaptability to diverse imaging modalities and task settings.

Technology Category

Application Category

📝 Abstract

Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes. However, while existing approaches do utilize textual modality information, they solely rely on simplistic templates with a class name, thereby neglecting richer semantic information. To address these limitations, we propose a novel framework that harnesses visual concepts generated by large language models (LLMs) as discriminative semantic guidance. Our method dynamically constructs a visual concept pool with a similarity-based filtering mechanism to prevent redundancy. Then, to integrate the concepts into the continual learning process, we employ a cross-modal image-concept attention module, coupled with an attention loss. Through attention, the module can leverage the semantic knowledge from relevant visual concepts and produce class-representative fused features for classification. Experiments on medical and natural image datasets show our method achieves state-of-the-art performance, demonstrating the effectiveness and superiority of our method. We will release the code publicly.

Problem

Research questions and friction points this paper is trying to address.

Enhancing continual learning with LLM-generated visual concepts

Addressing simplistic text templates in multimodal medical classification

Improving class-representative features via cross-modal attention mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated visual concepts enhance continual learning

Dynamic visual concept pool with similarity filtering

Cross-modal image-concept attention module for fusion

🔎 Similar Papers

No similar papers found.