SoC: Semantic Orthogonal Calibration for Test-Time Prompt Tuning

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the poor uncertainty calibration of vision-language models under test-time prompt tuning, where overconfidence often undermines reliability. While existing full orthogonality constraints improve prototype separation, they inadvertently disrupt the proximity of semantically related categories. This study is the first to identify and analyze this detrimental trade-off. To reconcile prototype separation with semantic coherence, we propose a novel semantic orthogonality calibration method based on Huber regularization. Our approach effectively preserves semantic similarity among related classes while enhancing inter-class separability. Extensive experiments across multiple benchmarks demonstrate that the proposed method significantly improves calibration performance without compromising discriminative capability, achieving competitive accuracy alongside well-calibrated predictions.

Technology Category

Application Category

📝 Abstract

With the increasing adoption of vision-language models (VLMs) in critical decision-making systems such as healthcare or autonomous driving, the calibration of their uncertainty estimates becomes paramount. Yet, this dimension has been largely underexplored in the VLM test-time prompt-tuning (TPT) literature, which has predominantly focused on improving their discriminative performance. Recent state-of-the-art advocates for enforcing full orthogonality over pairs of text prompt embeddings to enhance separability, and therefore calibration. Nevertheless, as we theoretically show in this work, the inherent gradients from fully orthogonal constraints will strongly push semantically related classes away, ultimately making the model overconfident. Based on our findings, we propose Semantic Orthogonal Calibration (SoC), a Huber-based regularizer that enforces smooth prototype separation while preserving semantic proximity, thereby improving calibration compared to prior orthogonality-based approaches. Across a comprehensive empirical validation, we demonstrate that SoC consistently improves calibration performance, while also maintaining competitive discriminative capabilities.

Problem

Research questions and friction points this paper is trying to address.

calibration

vision-language models

test-time prompt tuning

orthogonality

semantic proximity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Orthogonal Calibration

Test-Time Prompt Tuning

Vision-Language Models