Dual-Modality Anchor-Guided Filtering for Test-time Prompt Tuning

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

163K/year
🤖 AI Summary
This work addresses the challenge of unreliable model confidence under distribution shift during test-time prompt tuning, which hinders effective selection of informative image views. To overcome this, the authors propose a bimodal anchor-guided framework that leverages textual anchors to provide fine-grained semantic priors and adaptive image anchors to capture test-sample-specific statistical characteristics. Reliable views are jointly selected based on multi-view alignment and calibrated confidence, and predictions from both modalities are fused to construct a stable supervision signal for prompt updating. By discarding reliance on internally miscalibrated confidence and instead driving view selection and prompt optimization through external semantic evidence, the method achieves new state-of-the-art performance across 15 benchmark datasets, significantly enhancing model robustness and adaptability under distribution shift.

Technology Category

Application Category

📝 Abstract
Test-Time Prompt Tuning (TPT) adapts vision-language models using augmented views, but its effectiveness is hindered by the challenge of determining which views are beneficial. Standard entropy-based filtering relies on the internal confidence scores of the model, which are often miscalibrated under distribution shift, assigning high confidence to irrelevant crops or background regions while ignoring semantic content. To address this, we propose a dual-modality anchor-guided framework that grounds view selection in semantic evidence. We introduce a text anchor from attribute-rich descriptions, to provide fine-grained class semantics, and an adaptive image anchor that captures evolving test-time statistics. Using these anchors, we filter views based on alignment and confidence, ensuring that only informative views guide adaptation. Moreover, we treat the anchors as auxiliary predictive heads and combine their predictions with the original output in a confidence-weighted ensemble, yielding a stable supervision signal for prompt updates. Extensive experiments on 15 benchmark datasets demonstrate new state-of-the-art performance, highlighting the contribution of anchor-guided supervision as a foundation for robust prompt updates.
Problem

Research questions and friction points this paper is trying to address.

Test-Time Prompt Tuning
view selection
distribution shift
semantic content
confidence miscalibration
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-modality
anchor-guided filtering
test-time prompt tuning
semantic alignment
confidence-weighted ensemble