Improving Calibration in Test-Time Prompt Tuning for Vision-Language Models via Data-Free Flatness-Aware Prompt Pretraining

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the poor calibration often induced by Test-Time Prompt Tuning (TPT) in vision-language models, which compromises prediction reliability despite improving adaptability. To mitigate this issue, the authors propose Flatness-aware Prompt Pretraining (FPP), a method that initializes prompts within flat regions of the loss landscape during pretraining. FPP requires neither labeled data nor any modification to the standard TPT pipeline, rendering it entirely data-agnostic. Importantly, it introduces no additional computational overhead at test time while significantly enhancing both model calibration and downstream task accuracy. By aligning prompt initialization with loss landscape geometry, FPP achieves a synergistic improvement in predictive performance and calibration quality.
📝 Abstract
Test-time prompt tuning (TPT) has emerged as a promising technique for enhancing the adaptability of vision-language models by optimizing textual prompts using unlabeled test data. However, prior studies have observed that TPT often produces poorly calibrated models, raising concerns about the reliability of their predictions. Recent works address this issue by incorporating additional regularization terms that constrain model outputs, which improve calibration but often degrade performance. In this work, we reveal that these regularization strategies implicitly encourage optimization toward flatter minima, and that the sharpness of the loss landscape around adapted prompts is a key factor governing calibration quality. Motivated by this observation, we introduce Flatness-aware Prompt Pretraining (FPP), a simple yet effective pretraining framework for TPT that initializes prompts within flatter regions of the loss landscape prior to adaptation. We show that simply replacing the initialization in existing TPT pipelines--without modifying any other components--is sufficient to improve both calibration and performance. Notably, FPP requires no labeled data and incurs no additional computational costs during test-time tuning, making it highly practical for real-world deployment. The code is available at: https://github.com/YonseiML/fpp.
Problem

Research questions and friction points this paper is trying to address.

test-time prompt tuning
calibration
vision-language models
loss landscape sharpness
model reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time prompt tuning
calibration
flatness-aware pretraining
vision-language models
data-free
🔎 Similar Papers
No similar papers found.