DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the Base-New Trade-off (BNT) problem in CLIP prompt tuning—where performance on base classes and novel classes is mutually exclusive—this paper proposes the first prompt-level method that fully decouples their optimization pathways. Our approach introduces: (1) a dual-prompt collaborative architecture, separately modeling semantic representations for base and novel classes; (2) a weighted decoupling mechanism coupled with a dynamic hard negative sample optimizer to mitigate gradient conflicts; and (3) a theoretical proof of channel-wise invariance of prompt vectors in the feature space. The method requires no additional knowledge or model expansion. Extensive experiments demonstrate significant improvements in base-class accuracy across multiple vision backbones while preserving strong generalization to novel classes. Code is publicly available.

Technology Category

Application Category

📝 Abstract
The Base-New Trade-off (BNT) problem universally exists during the optimization of CLIP-based prompt tuning, where continuous fine-tuning on base (target) classes leads to a simultaneous decrease of generalization ability on new (unseen) classes. Existing approaches attempt to regulate the prompt tuning process to balance BNT by appending constraints. However, imposed on the same target prompt, these constraints fail to fully avert the mutual exclusivity between the optimization directions for base and new. As a novel solution to this challenge, we propose the plug-and-play Dual-Prompt Collaboration (DPC) framework, the first that decoupling the optimization processes of base and new tasks at the prompt level. Specifically, we clone a learnable parallel prompt based on the backbone prompt, and introduce a variable Weighting-Decoupling framework to independently control the optimization directions of dual prompts specific to base or new tasks, thus avoiding the conflict in generalization. Meanwhile, we propose a Dynamic Hard Negative Optimizer, utilizing dual prompts to construct a more challenging optimization task on base classes for enhancement. For interpretability, we prove the feature channel invariance of the prompt vector during the optimization process, providing theoretical support for the Weighting-Decoupling of DPC. Extensive experiments on multiple backbones demonstrate that DPC can significantly improve base performance without introducing any external knowledge beyond the base classes, while maintaining generalization to new classes. Code is available at: https://github.com/JREion/DPC.
Problem

Research questions and friction points this paper is trying to address.

Balancing base and new class generalization in CLIP-based prompt tuning.
Decoupling optimization processes for base and new tasks at prompt level.
Enhancing base class performance without losing new class generalization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Prompt Collaboration decouples base and new tasks.
Weighting-Decoupling framework controls dual prompts independently.
Dynamic Hard Negative Optimizer enhances base class performance.
🔎 Similar Papers
No similar papers found.
H
Haoyang Li
University of Technology Sydney
L
Liang Wang
Shanghai University
C
Chao Wang
Shanghai University
J
Jing Jiang
University of Technology Sydney
Yan Peng
Yan Peng
Professor, Shanghai University
Robotics
Guodong Long
Guodong Long
Associate Professor, Faculty of Engineering and IT, University of Technology Sydney
Federated LearningFoundation ModelsFederated IntelligenceFoundation AgentsDigital Health