FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the suboptimality of multi-template averaging in open-vocabulary semantic segmentation (OVSS). We observe that single-template classifiers significantly outperform the averaging strategy for specific categories and propose FLOSS—a zero-shot, annotation-free, and training-free method. FLOSS leverages CLIP’s zero-shot transfer capability and employs category-level prediction entropy to unsupervisedly identify “template-level category experts,” which are then dynamically weighted and fused. Our contributions are threefold: (1) the first empirical revelation of the latent expert potential of single-template classifiers in OVSS; (2) a plug-and-play, zero-cost framework for entropy-driven expert selection and fusion; and (3) consistent state-of-the-art performance across multiple OVSS benchmarks, with strong cross-dataset generalization of selected expert templates—achieving substantial gains using only a small set of unlabeled images.

Technology Category

Application Category

📝 Abstract

Recent Open-Vocabulary Semantic Segmentation (OVSS) models extend the CLIP model to segmentation while maintaining the use of multiple templates (e.g., a photo of, a sketch of a, etc.) for constructing class-wise averaged text embeddings, acting as a classifier. In this paper, we challenge this status quo and investigate the impact of templates for OVSS. Empirically, we observe that for each class, there exist single-template classifiers significantly outperforming the conventional averaged classifier. We refer to them as class-experts. Given access to unlabeled images and without any training involved, we estimate these experts by leveraging the class-wise prediction entropy of single-template classifiers, selecting as class-wise experts those which yield the lowest entropy. All experts, each specializing in a specific class, collaborate in a newly proposed fusion method to generate more accurate OVSS predictions. Our plug-and-play method, coined FLOSS, is orthogonal and complementary to existing OVSS methods, offering a ''free lunch'' to systematically improve OVSS without labels and additional training. Extensive experiments demonstrate that FLOSS consistently boosts state-of-the-art methods on various OVSS benchmarks. Moreover, the selected expert templates can generalize well from one dataset to others sharing the same semantic categories, yet exhibiting distribution shifts. Additionally, we obtain satisfactory improvements under a low-data regime, where only a few unlabeled images are available. Our code is available at https://github.com/yasserben/FLOSS .

Problem

Research questions and friction points this paper is trying to address.

Challenges template use in Open-Vocabulary Semantic Segmentation

Identifies single-template class-experts for better segmentation

Proposes fusion method to improve OVSS without training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-template class-experts outperform averaged classifiers

Class-experts selected via lowest prediction entropy

Plug-and-play FLOSS method boosts OVSS without training

🔎 Similar Papers

Auto-Vocabulary Semantic Segmentation