A-TPT: Angular Diversity Calibration Properties for Test-Time Prompt Tuning of Vision-Language Models

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing test-time prompt tuning (TPT) methods struggle to ensure angular dispersion of text features in unsupervised settings, degrading vision-language model (VLM) calibration performance and undermining reliability and generalization. To address this, we introduce **angular diversity** into the TPT framework for the first time, proposing to maximize the minimum pairwise angular distance among text features on the unit hypersphere—thereby achieving more uniform and discriminative inter-class feature distributions. Our label-free approach optimizes angular structure via prefix prompt learning. Extensive experiments across multiple backbone models and datasets demonstrate significant reductions in expected calibration error (ECE) without sacrificing accuracy. Moreover, our method exhibits superior calibration robustness and generalization under challenging zero-shot scenarios, including natural distribution shifts and medical imaging data.

Technology Category

Application Category

📝 Abstract

Test-time prompt tuning (TPT) has emerged as a promising technique for adapting large vision-language models (VLMs) to unseen tasks without relying on labeled data. However, the lack of dispersion between textual features can hurt calibration performance, which raises concerns about VLMs' reliability, trustworthiness, and safety. Current TPT approaches primarily focus on improving prompt calibration by either maximizing average textual feature dispersion or enforcing orthogonality constraints to encourage angular separation. However, these methods may not always have optimal angular separation between class-wise textual features, which implies overlooking the critical role of angular diversity. To address this, we propose A-TPT, a novel TPT framework that introduces angular diversity to encourage uniformity in the distribution of normalized textual features induced by corresponding learnable prompts. This uniformity is achieved by maximizing the minimum pairwise angular distance between features on the unit hypersphere. We show that our approach consistently surpasses state-of-the-art TPT methods in reducing the aggregate average calibration error while maintaining comparable accuracy through extensive experiments with various backbones on different datasets. Notably, our approach exhibits superior zero-shot calibration performance on natural distribution shifts and generalizes well to medical datasets. We provide extensive analyses, including theoretical aspects, to establish the grounding of A-TPT. These results highlight the potency of promoting angular diversity to achieve well-dispersed textual features, significantly improving VLM calibration during test-time adaptation. Our code will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

Improves calibration by maximizing angular diversity of textual features

Addresses poor dispersion in test-time prompt tuning of vision-language models

Reduces calibration error while maintaining accuracy across diverse datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces angular diversity for uniform feature distribution

Maximizes minimum pairwise angular distance on hypersphere

Improves calibration error while maintaining comparable accuracy

🔎 Similar Papers

No similar papers found.