🤖 AI Summary
Multi-label online continual learning (MOCL) confronts three core challenges: catastrophic forgetting, partial label absence, and long-tailed class distribution—yet existing methods overlook label-specific region identification, a fundamental capability in multi-label learning. To address this gap, we propose CUTER, the first orthogonal and plug-and-play strategy integrating label-level region localization and cropping into online continual learning. CUTER leverages pretrained models’ localization capacity to perform label-specific Cut-out, introduces structure-guided supervision signals to enforce discriminative feature learning, and incorporates enhanced experience replay for robust knowledge retention. By unifying these components, CUTER simultaneously mitigates forgetting, compensates for missing labels, and alleviates class imbalance. Extensive experiments demonstrate state-of-the-art performance across multiple multi-label image benchmarks, with strong generalization and seamless integration into diverse backbones. The code is publicly available.
📝 Abstract
Multi-Label Online Continual Learning (MOCL) requires models to learn continuously from endless multi-label data streams, facing complex challenges including persistent catastrophic forgetting, potential missing labels, and uncontrollable imbalanced class distributions. While existing MOCL methods attempt to address these challenges through various techniques, extit{they all overlook label-specific region identifying and feature learning} - a fundamental solution rooted in multi-label learning but challenging to achieve in the online setting with incremental and partial supervision. To this end, we first leverage the inherent structural information of input data to evaluate and verify the innate localization capability of different pre-trained models. Then, we propose CUTER (CUT-out-and-Experience-Replay), a simple yet versatile strategy that provides fine-grained supervision signals by further identifying, strengthening and cutting out label-specific regions for efficient experience replay. It not only enables models to simultaneously address catastrophic forgetting, missing labels, and class imbalance challenges, but also serves as an orthogonal solution that seamlessly integrates with existing approaches. Extensive experiments on multiple multi-label image benchmarks demonstrate the superiority of our proposed method. The code is available at href{https://github.com/wxr99/Cut-Replay}{https://github.com/wxr99/Cut-Replay}