Towards Robust Visual Continual Learning with Multi-Prototype Supervision

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing language-guided visual continual learning approaches face two key challenges: semantic ambiguity—caused by polysemy leading to representational conflicts—and intra-class visual diversity—where a single prototype fails to capture appearance variations. To address these, we propose MuproCL, a multi-prototype supervision framework. It leverages a frozen pre-trained language model to generate initial semantic targets and employs a lightweight LLM agent for category disambiguation and vision-modal expansion, thereby constructing context-aware multiple semantic prototypes. An adaptive LogSumExp aggregation mechanism dynamically selects and combines the most relevant prototypes for each input. Crucially, MuproCL abandons the restrictive single-prototype assumption without increasing parameters in the visual backbone, enhancing representation robustness. Extensive experiments on multiple continual learning benchmarks demonstrate significant improvements over state-of-the-art methods, validating the effectiveness and generalizability of multi-prototype supervision in complex semantic scenarios.

Technology Category

Application Category

📝 Abstract
Language-guided supervision, which utilizes a frozen semantic target from a Pretrained Language Model (PLM), has emerged as a promising paradigm for visual Continual Learning (CL). However, relying on a single target introduces two critical limitations: 1) semantic ambiguity, where a polysemous category name results in conflicting visual representations, and 2) intra-class visual diversity, where a single prototype fails to capture the rich variety of visual appearances within a class. To this end, we propose MuproCL, a novel framework that replaces the single target with multiple, context-aware prototypes. Specifically, we employ a lightweight LLM agent to perform category disambiguation and visual-modal expansion to generate a robust set of semantic prototypes. A LogSumExp aggregation mechanism allows the vision model to adaptively align with the most relevant prototype for a given image. Extensive experiments across various CL baselines demonstrate that MuproCL consistently enhances performance and robustness, establishing a more effective path for language-guided continual learning.
Problem

Research questions and friction points this paper is trying to address.

Addresses semantic ambiguity in language-guided continual learning
Resolves intra-class visual diversity with multiple prototypes
Enhances robustness and performance in visual continual learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-prototype supervision replacing single targets
LLM agent for disambiguation and prototype generation
LogSumExp aggregation for adaptive visual alignment
🔎 Similar Papers
X
Xiwei Liu
Mohamed bin Zayed University of Artificial Intelligence
Y
Yulong Li
Mohamed bin Zayed University of Artificial Intelligence
Y
Yichen Li
Mohamed bin Zayed University of Artificial Intelligence
X
Xinlin Zhuang
Mohamed bin Zayed University of Artificial Intelligence
Haolin Yang
Haolin Yang
University of Chicago
large language modelsnatural language processing
Huifa Li
Huifa Li
East China Normal University
Deep LearningGraph Neural NetworkLLMAI4Science
Imran Razzak
Imran Razzak
MBZUAI, Abu Dhabi
Human-Centered AIMedical Image AnalysisMedical Artificial IntelligenceComputational Biology