Revisiting Continual Semantic Segmentation with Pre-trained Vision Models

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

In continual semantic segmentation (CSS), catastrophic forgetting is commonly attributed to backbone fine-tuning; however, this work reveals that pretrained vision models inherently exhibit strong forgetting resistance, and classifier drift—not backbone adaptation—is the primary cause of forgetting. To address this, we propose DFT*, a novel framework that freezes both the pretrained backbone and all previously learned classifiers, allocating and fine-tuning only new-class classifiers in advance. DFT* abandons conventional feature-space retraining, drastically reducing parameter count and computational overhead. Evaluated across eight CSS settings on Pascal VOC 2012 and ADE20K, DFT* consistently outperforms 16 state-of-the-art methods, achieving substantial average mIoU gains. It reduces trainable parameters by ~40% and training time by over 30%, empirically validating the “freeze-over-finetune” paradigm as a more effective and efficient approach to CSS.

Technology Category

Application Category

📝 Abstract

Continual Semantic Segmentation (CSS) seeks to incrementally learn to segment novel classes while preserving knowledge of previously encountered ones. Recent advancements in CSS have been largely driven by the adoption of Pre-trained Vision Models (PVMs) as backbones. Among existing strategies, Direct Fine-Tuning (DFT), which sequentially fine-tunes the model across classes, remains the most straightforward approach. Prior work often regards DFT as a performance lower bound due to its presumed vulnerability to severe catastrophic forgetting, leading to the development of numerous complex mitigation techniques. However, we contend that this prevailing assumption is flawed. In this paper, we systematically revisit forgetting in DFT across two standard benchmarks, Pascal VOC 2012 and ADE20K, under eight CSS settings using two representative PVM backbones: ResNet101 and Swin-B. Through a detailed probing analysis, our findings reveal that existing methods significantly underestimate the inherent anti-forgetting capabilities of PVMs. Even under DFT, PVMs retain previously learned knowledge with minimal forgetting. Further investigation of the feature space indicates that the observed forgetting primarily arises from the classifier's drift away from the PVM, rather than from degradation of the backbone representations. Based on this insight, we propose DFT*, a simple yet effective enhancement to DFT that incorporates strategies such as freezing the PVM backbone and previously learned classifiers, as well as pre-allocating future classifiers. Extensive experiments show that DFT* consistently achieves competitive or superior performance compared to sixteen state-of-the-art CSS methods, while requiring substantially fewer trainable parameters and less training time.

Problem

Research questions and friction points this paper is trying to address.

Reassessing catastrophic forgetting in continual semantic segmentation

Evaluating pre-trained vision models' anti-forgetting capabilities

Proposing enhanced direct fine-tuning to mitigate classifier drift

Innovation

Methods, ideas, or system contributions that make the work stand out.

Revisiting Direct Fine-Tuning for CSS

Freezing PVM backbone and classifiers

Pre-allocating future classifiers

🔎 Similar Papers

No similar papers found.