Towards Robust Visual Continual Learning with Multi-Prototype Supervision

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
Existing language-guided visual continual learning approaches face two key challenges: semantic ambiguity—caused by polysemy leading to representational conflicts—and intra-class visual diversity—where a single prototype fails to capture appearance variations. To address these, we propose MuproCL, a multi-prototype supervision framework. It leverages a frozen pre-trained language model to generate initial semantic targets and employs a lightweight LLM agent for category disambiguation and vision-modal expansion, thereby constructing context-aware multiple semantic prototypes. An adaptive LogSumExp aggregation mechanism dynamically selects and combines the most relevant prototypes for each input. Crucially, MuproCL abandons the restrictive single-prototype assumption without increasing parameters in the visual backbone, enhancing representation robustness. Extensive experiments on multiple continual learning benchmarks demonstrate significant improvements over state-of-the-art methods, validating the effectiveness and generalizability of multi-prototype supervision in complex semantic scenarios.

Technology Category

Application Category

📝 Abstract
Language-guided supervision, which utilizes a frozen semantic target from a Pretrained Language Model (PLM), has emerged as a promising paradigm for visual Continual Learning (CL). However, relying on a single target introduces two critical limitations: 1) semantic ambiguity, where a polysemous category name results in conflicting visual representations, and 2) intra-class visual diversity, where a single prototype fails to capture the rich variety of visual appearances within a class. To this end, we propose MuproCL, a novel framework that replaces the single target with multiple, context-aware prototypes. Specifically, we employ a lightweight LLM agent to perform category disambiguation and visual-modal expansion to generate a robust set of semantic prototypes. A LogSumExp aggregation mechanism allows the vision model to adaptively align with the most relevant prototype for a given image. Extensive experiments across various CL baselines demonstrate that MuproCL consistently enhances performance and robustness, establishing a more effective path for language-guided continual learning.
Problem

Research questions and friction points this paper is trying to address.

Addresses semantic ambiguity in language-guided continual learning
Resolves intra-class visual diversity with multiple prototypes
Enhances robustness and performance in visual continual learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-prototype supervision replacing single targets
LLM agent for disambiguation and prototype generation
LogSumExp aggregation for adaptive visual alignment
🔎 Similar Papers