Personalized OVSS: Understanding Personal Concept in Open-Vocabulary Semantic Segmentation

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This paper addresses the limitation of open-vocabulary semantic segmentation (OVSS) in recognizing personalized textual descriptions (e.g., “my coffee mug”). We formally introduce **Personalized OVSS**, a novel task requiring precise localization of user-specific object instances within categories. To suppress false positives among semantically similar objects, we propose a **negative mask proposal mechanism**, and to enhance the discriminability of text prompts for personalization, we design a **visual embedding injection strategy**. Our method enables efficient fine-tuning using only a few image-mask pairs. Evaluated on three newly constructed benchmarks—FSS$^ ext{per}$, CUB$^ ext{per}$, and ADE$^ ext{per}$—it achieves substantial gains in personalized region segmentation accuracy while preserving the original OVSS model’s generalization performance on open-vocabulary categories.

Technology Category

Application Category

📝 Abstract

While open-vocabulary semantic segmentation (OVSS) can segment an image into semantic regions based on arbitrarily given text descriptions even for classes unseen during training, it fails to understand personal texts (e.g., `my mug cup') for segmenting regions of specific interest to users. This paper addresses challenges like recognizing `my mug cup' among `multiple mug cups'. To overcome this challenge, we introduce a novel task termed extit{personalized open-vocabulary semantic segmentation} and propose a text prompt tuning-based plug-in method designed to recognize personal visual concepts using a few pairs of images and masks, while maintaining the performance of the original OVSS. Based on the observation that reducing false predictions is essential when applying text prompt tuning to this task, our proposed method employs `negative mask proposal' that captures visual concepts other than the personalized concept. We further improve the performance by enriching the representation of text prompts by injecting visual embeddings of the personal concept into them. This approach enhances personalized OVSS without compromising the original OVSS performance. We demonstrate the superiority of our method on our newly established benchmarks for this task, including FSS$^ ext{per}$, CUB$^ ext{per}$, and ADE$^ ext{per}$.

Problem

Research questions and friction points this paper is trying to address.

Recognizing personal visual concepts in segmentation

Reducing false predictions in personalized OVSS

Maintaining original OVSS performance while personalizing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text prompt tuning for personal concept recognition

Negative mask proposal reduces false predictions

Visual embeddings enrich text prompt representation

🔎 Similar Papers

Auto-Vocabulary Semantic Segmentation