🤖 AI Summary
Existing CLIP prompt learning methods rely on static text anchors—both their values and positions are fixed—limiting adaptability across diverse tasks and training stages. To address this, we propose AnchorOPT, the first framework to jointly optimize both anchor values and positions dynamically. AnchorOPT introduces a task- and stage-aware positional matrix and jointly learns textual anchors, soft tokens, and conditional positional embeddings. It employs a lightweight two-stage training strategy, requiring no additional regularization or complex architectural components. Evaluated on multiple cross-domain datasets, AnchorOPT achieves performance on par with or superior to state-of-the-art methods, despite its significantly simpler architecture. Moreover, as a plug-and-play module, it consistently enhances the generalization and transferability of various CLIP prompt learning frameworks.
📝 Abstract
Existing prompt learning methods, which are built upon CLIP models, leverage textual tokens as anchors to guide the learnable soft tokens. This guidance improves CLIP generalizations. However, these anchors-static in both value and position-lack cross-task and stage-adaptive flexibility. To address this limitation, we propose AnchorOPT, a dynamic anchor-based prompt learning framework. Specifically, AnchorOPT introduces dynamism in two key dimensions: (i) anchor values eschew handcrafted explicit textual tokens (e.g., "shape", "color"), instead learning dynamically from task-specific data; and (ii) the positional relationship between anchor and soft tokens is no longer fixed but adaptively optimized via a learnable position matrix conditioned on the training stage and task context. Training occurs in two stages: we first learn the anchor tokens, then freeze and transfer them to the second stage for optimization of soft tokens and the position matrix. Extensive experiments demonstrate that using only a simple learnable anchor and position matrix achieves performance comparable to or exceeding some methods incorporating additional learnable modules or regularization techniques. As a plug-and-play module, AnchorOPT integrates seamlessly into existing frameworks, yielding consistent performance gains across diverse datasets. Code is publicly available at https://github.com/zhengli97/ATPrompt.