🤖 AI Summary
Existing vision-language models neglect inter-sample dependencies during test-time online prompt tuning and are prone to prompt collapse due to error accumulation, severely limiting zero-shot generalization. To address this, we propose a dynamic online prompt buffering and adaptive selection framework. Our method introduces a novel dual-criterion dynamic prompt selection strategy based on prediction entropy and probability margin, and designs learnable prompt appending and pruning mechanisms to jointly optimize data distribution utilization and error suppression. Crucially, the approach requires no model parameter fine-tuning and operates entirely in the zero-shot test phase. Evaluated across 14 cross-domain datasets, it achieves significant improvements in zero-shot accuracy, effectively mitigates prompt collapse, and enhances both test-time adaptability and robustness.
📝 Abstract
Test-time prompt tuning enhances zero-shot generalization of vision-language models but tends to ignore the relatedness among test samples during inference. Online test-time prompt tuning provides a simple way to leverage the information in previous test samples, albeit with the risk of prompt collapse due to error accumulation. To enhance test-time prompt tuning, we propose DynaPrompt, short for dynamic test-time prompt tuning, exploiting relevant data distribution information while reducing error accumulation. Built on an online prompt buffer, DynaPrompt adaptively selects and optimizes the relevant prompts for each test sample during tuning. Specifically, we introduce a dynamic prompt selection strategy based on two metrics: prediction entropy and probability difference. For unseen test data information, we develop dynamic prompt appending, which allows the buffer to append new prompts and delete the inactive ones. By doing so, the prompts are optimized to exploit beneficial information on specific test data, while alleviating error accumulation. Experiments on fourteen datasets demonstrate the effectiveness of dynamic test-time prompt tuning.