🤖 AI Summary
To address the weak zero-shot generalization and poor domain adaptation in open-vocabulary semantic segmentation (OVSS) for domain-specific images, this paper proposes Seg-TTO, a test-time optimization framework. Seg-TTO requires no additional annotations and jointly optimizes multi-concept text embeddings and pixel-level spatial structure preservation in the visual encoder, introducing the first segmentation-specific self-supervised objective for test-time adaptation. Its core techniques include pixel-wise contrastive learning, spatially aware embedding aggregation, and a plug-and-play modular design. Evaluated on 22 domain-specific OVSS benchmarks, Seg-TTO consistently outperforms existing methods, establishing new state-of-the-art performance. Notably, it significantly narrows the performance gap between zero-shot and fully supervised segmentation, demonstrating superior cross-domain generalization and robustness.
📝 Abstract
We present Seg-TTO, a novel framework for zero-shot, open-vocabulary semantic segmentation (OVSS), designed to excel in specialized domain tasks. While current open vocabulary approaches show impressive performance on standard segmentation benchmarks under zero-shot settings, they fall short of supervised counterparts on highly domain-specific datasets. We focus on segmentation-specific test-time optimization to address this gap. Segmentation requires an understanding of multiple concepts within a single image while retaining the locality and spatial structure of representations. We propose a novel self-supervised objective adhering to these requirements and use it to align the model parameters with input images at test time. In the textual modality, we learn multiple embeddings for each category to capture diverse concepts within an image, while in the visual modality, we calculate pixel-level losses followed by embedding aggregation operations specific to preserving spatial structure. Our resulting framework termed Seg-TTO is a plug-in-play module. We integrate Seg-TTO with three state-of-the-art OVSS approaches and evaluate across 22 challenging OVSS tasks covering a range of specialized domains. Our Seg-TTO demonstrates clear performance improvements across these establishing new state-of-the-art. Code: https://github.com/UlinduP/SegTTO.