Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing continual learning methods for medical imaging rely on simplistic class-name templates, failing to capture fine-grained disease semantics. To address this, we propose a cross-modal continual learning framework that integrates large language model (LLM)-generated visual concepts. Our method dynamically constructs a visual concept pool from LLM outputs and filters it via semantic similarity; introduces an image–concept cross-modal attention module; and incorporates an attention-based loss to enhance class-discriminative feature learning. The framework preserves performance on previously learned classes while significantly improving recognition accuracy for novel classes. Evaluated on multiple medical and natural-image continual learning benchmarks, it achieves state-of-the-art results—yielding an average classification accuracy improvement of 3.2% over prior methods. Comprehensive ablation studies and cross-domain experiments validate its generalizability and adaptability to diverse imaging modalities and task settings.

Technology Category

Application Category

📝 Abstract
Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes. However, while existing approaches do utilize textual modality information, they solely rely on simplistic templates with a class name, thereby neglecting richer semantic information. To address these limitations, we propose a novel framework that harnesses visual concepts generated by large language models (LLMs) as discriminative semantic guidance. Our method dynamically constructs a visual concept pool with a similarity-based filtering mechanism to prevent redundancy. Then, to integrate the concepts into the continual learning process, we employ a cross-modal image-concept attention module, coupled with an attention loss. Through attention, the module can leverage the semantic knowledge from relevant visual concepts and produce class-representative fused features for classification. Experiments on medical and natural image datasets show our method achieves state-of-the-art performance, demonstrating the effectiveness and superiority of our method. We will release the code publicly.
Problem

Research questions and friction points this paper is trying to address.

Enhancing continual learning with LLM-generated visual concepts
Addressing simplistic text templates in multimodal medical classification
Improving class-representative features via cross-modal attention mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated visual concepts enhance continual learning
Dynamic visual concept pool with similarity filtering
Cross-modal image-concept attention module for fusion
🔎 Similar Papers
No similar papers found.
J
Jiantao Tan
Sun Yat-sen University, Guangzhou, China
Peixian Ma
Peixian Ma
IDEA Research / HKUST(GZ)
NL2SQLNLPAgentsLarge Language ModelsReinforcement Learning
Kanghao Chen
Kanghao Chen
Hong Kong University of Science and Technology (Guangzhou)
computer vision
Z
Zhiming Dai
Sun Yat-sen University, Guangzhou, China
R
Ruixuan Wang
Sun Yat-sen University, Guangzhou, China; Peng Cheng Laboratory, Shenzhen, China; Key Laboratory of Machine Intelligence and Advanced Computing, MOE, Guangzhou, China