Prompt Tuning for CLIP on the Pretrained Manifold

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation of generalization in prompt tuning under limited supervision, where existing methods often disrupt the pretrained model’s representation structure, causing features to deviate from the pretrained manifold. To mitigate this issue, the authors propose ManiPT, a novel framework that constrains prompt tuning strictly within the pretrained manifold. ManiPT leverages a dual-modality cosine consistency constraint and structural bias to guide optimization, complemented by an incremental correction mechanism that alleviates few-shot overfitting at the geometric level. Extensive experiments based on CLIP demonstrate that ManiPT consistently outperforms baseline methods across four challenging tasks—generalization to unseen classes, few-shot classification, cross-dataset transfer, and domain generalization—yielding significant improvements in average performance.

Technology Category

Application Category

📝 Abstract
Prompt tuning introduces learnable prompt vectors that adapt pretrained vision-language models to downstream tasks in a parameter-efficient manner. However, under limited supervision, prompt tuning alters pretrained representations and drives downstream features away from the pretrained manifold toward directions that are unfavorable for transfer. This drift degrades generalization. To address this limitation, we propose ManiPT, a framework that performs prompt tuning on the pretrained manifold. ManiPT introduces cosine consistency constraints in both the text and image modalities to confine the learned representations within the pretrained geometric neighborhood. Furthermore, we introduce a structural bias that enforces incremental corrections, guiding the adaptation along transferable directions to mitigate reliance on shortcut learning. From a theoretical perspective, ManiPT alleviates overfitting tendencies under limited data. Our experiments cover four downstream settings: unseen-class generalization, few-shot classification, cross-dataset transfer, and domain generalization. Across these settings, ManiPT achieves higher average performance than baseline methods. Notably, ManiPT provides an explicit perspective on how prompt tuning overfits under limited supervision.
Problem

Research questions and friction points this paper is trying to address.

prompt tuning
pretrained manifold
limited supervision
representation drift
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt Tuning
Pretrained Manifold
Cosine Consistency
Structural Bias
CLIP
🔎 Similar Papers
No similar papers found.