OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses low-shot open-set domain generalization (LSOSDG), a novel task that confronts two key challenges: (1) the weak domain generalization capability of vision-language models (e.g., CLIP) under extreme low-shot supervision (e.g., 1-shot), and (2) inaccurate rejection of open-set samples exhibiting fine-grained semantic deviations. To tackle these, we propose a domain-agnostic prompt learning framework coupled with pseudo open-set sample synthesis. Specifically, we design learnable, domain- and class-agnostic visual prompts and incorporate a cross-attention module to explicitly model vision–language alignment. Additionally, we leverage foundation models to directionally synthesize pseudo open-set samples, thereby enhancing discriminative capacity for fine-grained unknown classes. Extensive experiments across five benchmarks demonstrate consistent and significant improvements over state-of-the-art methods—achieving superior low-shot domain generalization accuracy and markedly enhanced open-set rejection performance, particularly for fine-grained semantic outliers.

Technology Category

Application Category

📝 Abstract
We introduce Low-Shot Open-Set Domain Generalization (LSOSDG), a novel paradigm unifying low-shot learning with open-set domain generalization (ODG). While prompt-based methods using models like CLIP have advanced DG, they falter in low-data regimes (e.g., 1-shot) and lack precision in detecting open-set samples with fine-grained semantics related to training classes. To address these challenges, we propose OSLOPROMPT, an advanced prompt-learning framework for CLIP with two core innovations. First, to manage limited supervision across source domains and improve DG, we introduce a domain-agnostic prompt-learning mechanism that integrates adaptable domain-specific cues and visually guided semantic attributes through a novel cross-attention module, besides being supported by learnable domain- and class-generic visual prompts to enhance cross-modal adaptability. Second, to improve outlier rejection during inference, we classify unfamiliar samples as"unknown"and train specialized prompts with systematically synthesized pseudo-open samples that maintain fine-grained relationships to known classes, generated through a targeted query strategy with off-the-shelf foundation models. This strategy enhances feature learning, enabling our model to detect open samples with varied granularity more effectively. Extensive evaluations across five benchmarks demonstrate that OSLOPROMPT establishes a new state-of-the-art in LSOSDG, significantly outperforming existing methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses low-shot learning challenges in domain generalization.
Improves detection of open-set samples with fine-grained semantics.
Enhances cross-modal adaptability and outlier rejection in CLIP models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-agnostic prompt-learning with cross-attention
Specialized prompts for outlier rejection
Synthesized pseudo-open samples for training
🔎 Similar Papers
No similar papers found.
M
Mohamad Hassan
Indian Institute of Technology Bombay
D
Divyam Gupta
Indian Institute of Technology Bombay
Mainak Singha
Mainak Singha
Marie-Curie & ELLIS PhD Fellow at University of Trento
Computer VisionVision-Language ModelsMultimodal AI
S
Sai Bhargav Rongali
Indian Institute of Technology Bombay
Ankit Jha
Ankit Jha
Researcher and Faculty, CSE, The LNMIIT Jaipur
Remote SensingComputer VisionMachine LearningVLMsPrompt Learning
M
Muhammad Haris Khan
Mohamed Bin Zayed University of Artificial Intelligence
B
Biplab Banerjee
Indian Institute of Technology Bombay