OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluation metrics for open-world prompt tuning lack a unified standard that simultaneously addresses domain detection (distinguishing base vs. novel domains), fine-grained classification, and robustness to varying base-to-novel sample ratios. Method: We propose OpenworldAUC—the first metric satisfying all three criteria: domain-agnostic, jointly evaluating both P1 (domain detection) and P2 (classification) tasks, and ratio-invariant. We further introduce Gated Mixture of Prompts (GMoP), a novel architecture integrating CLIP’s vision-language representations with pairwise instance comparison; it is theoretically grounded in generalization bounds and optimized end-to-end under the unified OpenworldAUC objective. Contribution/Results: Our approach achieves state-of-the-art performance across 15 open-world benchmarks, significantly improving OpenworldAUC, harmonic mean (HM), and AUROC. The code is publicly available.

Technology Category

Application Category

📝 Abstract
Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What's more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose OpenworldAUC, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize OpenworldAUC effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on OpenworldAUC and other metrics. We release the code at https://github.com/huacong/OpenworldAUC
Problem

Research questions and friction points this paper is trying to address.

Unified evaluation for open-world prompt tuning stages
Metric insensitive to varying domain sample ratios
Dynamic balance between detection and classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes OpenworldAUC for unified evaluation
Introduces GMoP with domain-specific prompts
Dynamically balances detection and classification
🔎 Similar Papers
No similar papers found.