🤖 AI Summary
Existing continual learning methods for medical imaging rely on simplistic class-name templates, failing to capture fine-grained disease semantics. To address this, we propose a cross-modal continual learning framework that integrates large language model (LLM)-generated visual concepts. Our method dynamically constructs a visual concept pool from LLM outputs and filters it via semantic similarity; introduces an image–concept cross-modal attention module; and incorporates an attention-based loss to enhance class-discriminative feature learning. The framework preserves performance on previously learned classes while significantly improving recognition accuracy for novel classes. Evaluated on multiple medical and natural-image continual learning benchmarks, it achieves state-of-the-art results—yielding an average classification accuracy improvement of 3.2% over prior methods. Comprehensive ablation studies and cross-domain experiments validate its generalizability and adaptability to diverse imaging modalities and task settings.
📝 Abstract
Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes. However, while existing approaches do utilize textual modality information, they solely rely on simplistic templates with a class name, thereby neglecting richer semantic information. To address these limitations, we propose a novel framework that harnesses visual concepts generated by large language models (LLMs) as discriminative semantic guidance. Our method dynamically constructs a visual concept pool with a similarity-based filtering mechanism to prevent redundancy. Then, to integrate the concepts into the continual learning process, we employ a cross-modal image-concept attention module, coupled with an attention loss. Through attention, the module can leverage the semantic knowledge from relevant visual concepts and produce class-representative fused features for classification. Experiments on medical and natural image datasets show our method achieves state-of-the-art performance, demonstrating the effectiveness and superiority of our method. We will release the code publicly.