FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inherent trade-offs among generalization, training efficiency, and parameter efficiency in prompt learning for vision-language models (VLMs)—particularly weak zero-shot recognition and insufficient region-level semantic modeling—this paper proposes a region-aware dual-space prompt learning paradigm. Our method introduces a novel mutual learning mechanism between positive and negative prompt spaces to enable soft-supervised cross-stage contextual sharing; incorporates region-wise random cropping for multi-granularity modeling; adopts joint similarity-dissimilarity learning; and employs offline teacher knowledge caching with I/O acceleration. Evaluated on 11 benchmarks, our approach significantly improves base-to-new generalization, cross-dataset transfer, and robustness. It achieves state-of-the-art zero-shot performance while accelerating training by 2.2×, thereby reconciling strong generalization with high parameter and training efficiency.

Technology Category

Application Category

📝 Abstract
Prompt learning as a parameter-efficient method that has been widely adopted to adapt Vision-Language Models (VLMs) to downstream tasks. While hard-prompt design requires domain expertise and iterative optimization, soft-prompt methods rely heavily on task-specific hard labels, limiting their generalization to unseen categories. Recent popular distillation-based prompt learning methods improve generalization by exploiting larger teacher VLMs and unsupervised knowledge transfer, yet their repetitive teacher model online inference sacrifices the inherent training efficiency advantage of prompt learning. In this paper, we propose {{large { extbf{F}}}}aster {{large { extbf{D}}}}istillation-{{large { extbf{B}}}}ased {{large { extbf{P}}}}rompt {{large { extbf{L}}}}earning ( extbf{FDBPL}), which addresses these issues by sharing soft supervision contexts across multiple training stages and implementing accelerated I/O. Furthermore, FDBPL introduces a region-aware prompt learning paradigm with dual positive-negative prompt spaces to fully exploit randomly cropped regions that containing multi-level information. We propose a positive-negative space mutual learning mechanism based on similarity-difference learning, enabling student CLIP models to recognize correct semantics while learning to reject weakly related concepts, thereby improving zero-shot performance. Unlike existing distillation-based prompt learning methods that sacrifice parameter efficiency for generalization, FDBPL maintains dual advantages of parameter efficiency and strong downstream generalization. Comprehensive evaluations across 11 datasets demonstrate superior performance in base-to-new generalization, cross-dataset transfer, and robustness tests, achieving $2.2 imes$ faster training speed.
Problem

Research questions and friction points this paper is trying to address.

Improves efficiency of distillation-based prompt learning
Enhances generalization with region-aware dual prompt spaces
Maintains parameter efficiency while boosting zero-shot performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Faster distillation-based prompt learning method
Region-aware dual positive-negative prompt spaces
Similarity-difference mutual learning mechanism
🔎 Similar Papers
No similar papers found.