🤖 AI Summary
Existing evaluation metrics for open-world prompt tuning lack a unified standard that simultaneously addresses domain detection (distinguishing base vs. novel domains), fine-grained classification, and robustness to varying base-to-novel sample ratios.
Method: We propose OpenworldAUC—the first metric satisfying all three criteria: domain-agnostic, jointly evaluating both P1 (domain detection) and P2 (classification) tasks, and ratio-invariant. We further introduce Gated Mixture of Prompts (GMoP), a novel architecture integrating CLIP’s vision-language representations with pairwise instance comparison; it is theoretically grounded in generalization bounds and optimized end-to-end under the unified OpenworldAUC objective.
Contribution/Results: Our approach achieves state-of-the-art performance across 15 open-world benchmarks, significantly improving OpenworldAUC, harmonic mean (HM), and AUROC. The code is publicly available.
📝 Abstract
Prompt tuning adapts Vision-Language Models like CLIP to open-world tasks with minimal training costs. In this direction, one typical paradigm evaluates model performance separately on known classes (i.e., base domain) and unseen classes (i.e., new domain). However, real-world scenarios require models to handle inputs without prior domain knowledge. This practical challenge has spurred the development of open-world prompt tuning, which demands a unified evaluation of two stages: 1) detecting whether an input belongs to the base or new domain (P1), and 2) classifying the sample into its correct class (P2). What's more, as domain distributions are generally unknown, a proper metric should be insensitive to varying base/new sample ratios (P3). However, we find that current metrics, including HM, overall accuracy, and AUROC, fail to satisfy these three properties simultaneously. To bridge this gap, we propose OpenworldAUC, a unified metric that jointly assesses detection and classification through pairwise instance comparisons. To optimize OpenworldAUC effectively, we introduce Gated Mixture-of-Prompts (GMoP), which employs domain-specific prompts and a gating mechanism to dynamically balance detection and classification. Theoretical guarantees ensure generalization of GMoP under practical conditions. Experiments on 15 benchmarks in open-world scenarios show GMoP achieves SOTA performance on OpenworldAUC and other metrics. We release the code at https://github.com/huacong/OpenworldAUC