Semantic Hierarchical Prompt Tuning for Parameter-Efficient Fine-Tuning

📅 2024-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address semantic fragmentation across prompt layers, disruption of self-attention mechanisms, and insufficient discriminative feature learning in Vision Prompt Tuning (VPT), this paper proposes Semantic Hierarchical Prompt Tuning (SHIP). The method introduces three key innovations: (1) a novel hierarchical prompt mechanism that explicitly separates semantically independent and shared prompts to enable multi-level semantic modeling; (2) attribute-aware prompt embeddings coupled with a prompt-matching loss to enhance category-specific feature focusing; and (3) a decoupled attention module that preserves robustness while improving inference efficiency. Evaluated on the VTAB-1k benchmark using ViT-B/16 as the backbone, SHIP achieves a 4.9% absolute accuracy gain over standard VPT, demonstrating substantial improvements in cross-task transferability and parameter efficiency.

Technology Category

Application Category

📝 Abstract
As the scale of vision models continues to grow, Visual Prompt Tuning (VPT) has emerged as a parameter-efficient transfer learning technique, noted for its superior performance compared to full fine-tuning. However, indiscriminately applying prompts to every layer without considering their inherent correlations, can cause significant disturbances, leading to suboptimal transferability. Additionally, VPT disrupts the original self-attention structure, affecting the aggregation of visual features, and lacks a mechanism for explicitly mining discriminative visual features, which are crucial for classification. To address these issues, we propose a Semantic Hierarchical Prompt (SHIP) fine-tuning strategy. We adaptively construct semantic hierarchies and use semantic-independent and semantic-shared prompts to learn hierarchical representations. We also integrate attribute prompts and a prompt matching loss to enhance feature discrimination and employ decoupled attention for robustness and reduced inference costs. SHIP significantly improves performance, achieving a 4.9% gain in accuracy over VPT with a ViT-B/16 backbone on VTAB-1k tasks. Our code is available at https://github.com/haoweiz23/SHIP.
Problem

Research questions and friction points this paper is trying to address.

Visual Prompt Tuning
Hierarchical Prompts
Attention Structure Preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Hierarchical Prompting
Decoupled Attention Mechanism
Attribute Prompt
Haowei Zhu
Haowei Zhu
Tsinghua University
Computer Vision
F
Fangyuan Zhang
School of Software, Tsinghua University, Beijing, China
Rui Qin
Rui Qin
Tsighua University
T
Tianxiang Pan
School of Software, Tsinghua University, Beijing, China
J
Junhai Yong
School of Software, Tsinghua University, Beijing, China
B
Bin Wang
School of Software, Tsinghua University, Beijing, China