🤖 AI Summary
In open-vocabulary semantic segmentation (OVSS), existing training-free methods suffer from two key limitations: class redundancy—introducing irrelevant categories absent in test images—and vision-language ambiguity—causing confusion among semantically similar classes, thereby degrading the quality of class activation maps and affinity refinement maps. To address these challenges without any parameter tuning, we propose FreeCP, the first training-free class purification framework that operates directly on frozen pre-trained vision-language models (VLMs). FreeCP introduces a semantic-similarity-driven class purification strategy to eliminate redundant categories and a class activation correction mechanism to mitigate ambiguity-induced confusion. The framework is fully plug-and-play and requires no training or fine-tuning. Extensive experiments across eight mainstream benchmarks demonstrate that FreeCP significantly outperforms existing training-free methods, validating its strong generalization capability and effectiveness as a universal module for OVSS.
📝 Abstract
Fine-tuning pre-trained vision-language models has emerged as a powerful approach for enhancing open-vocabulary semantic segmentation (OVSS). However, the substantial computational and resource demands associated with training on large datasets have prompted interest in training-free methods for OVSS. Existing training-free approaches primarily focus on modifying model architectures and generating prototypes to improve segmentation performance. However, they often neglect the challenges posed by class redundancy, where multiple categories are not present in the current test image, and visual-language ambiguity, where semantic similarities among categories create confusion in class activation. These issues can lead to suboptimal class activation maps and affinity-refined activation maps. Motivated by these observations, we propose FreeCP, a novel training-free class purification framework designed to address these challenges. FreeCP focuses on purifying semantic categories and rectifying errors caused by redundancy and ambiguity. The purified class representations are then leveraged to produce final segmentation predictions. We conduct extensive experiments across eight benchmarks to validate FreeCP's effectiveness. Results demonstrate that FreeCP, as a plug-and-play module, significantly boosts segmentation performance when combined with other OVSS methods.