OpenworldAUC: Towards Unified Evaluation and Optimization for Open-world Prompt Tuning

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing evaluation metrics for open-world prompt tuning lack a unified standard that simultaneously addresses domain detection (distinguishing base vs. novel domains), fine-grained classification, and robustness to varying base-to-novel sample ratios. Method: We propose OpenworldAUC—the first metric satisfying all three criteria: domain-agnostic, jointly evaluating both P1 (domain detection) and P2 (classification) tasks, and ratio-invariant. We further introduce Gated Mixture of Prompts (GMoP), a novel architecture integrating CLIP’s vision-language representations with pairwise instance comparison; it is theoretically grounded in generalization bounds and optimized end-to-end under the unified OpenworldAUC objective. Contribution/Results: Our approach achieves state-of-the-art performance across 15 open-world benchmarks, significantly improving OpenworldAUC, harmonic mean (HM), and AUROC. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What's more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose OpenworldAUC, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize OpenworldAUC effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on OpenworldAUC and other metrics. We release the code at https://github.com/huacong/OpenworldAUC

Problem

Research questions and friction points this paper is trying to address.

Unified evaluation for open-world prompt tuning stages

Metric insensitive to varying domain sample ratios

Dynamic balance between detection and classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes OpenworldAUC for unified evaluation

Introduces GMoP with domain-specific prompts

Dynamically balances detection and classification

🔎 Similar Papers

No similar papers found.