Semantic Hierarchical Prompt Tuning for Parameter-Efficient Fine-Tuning

📅 2024-12-22

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address semantic fragmentation across prompt layers, disruption of self-attention mechanisms, and insufficient discriminative feature learning in Vision Prompt Tuning (VPT), this paper proposes Semantic Hierarchical Prompt Tuning (SHIP). The method introduces three key innovations: (1) a novel hierarchical prompt mechanism that explicitly separates semantically independent and shared prompts to enable multi-level semantic modeling; (2) attribute-aware prompt embeddings coupled with a prompt-matching loss to enhance category-specific feature focusing; and (3) a decoupled attention module that preserves robustness while improving inference efficiency. Evaluated on the VTAB-1k benchmark using ViT-B/16 as the backbone, SHIP achieves a 4.9% absolute accuracy gain over standard VPT, demonstrating substantial improvements in cross-task transferability and parameter efficiency.

Technology Category

Application Category

📝 Abstract

As the scale of vision models continues to grow, Visual Prompt Tuning (VPT) has emerged as a parameter-efficient transfer learning technique, noted for its superior performance compared to full fine-tuning. However, indiscriminately applying prompts to every layer without considering their inherent correlations, can cause significant disturbances, leading to suboptimal transferability. Additionally, VPT disrupts the original self-attention structure, affecting the aggregation of visual features, and lacks a mechanism for explicitly mining discriminative visual features, which are crucial for classification. To address these issues, we propose a Semantic Hierarchical Prompt (SHIP) fine-tuning strategy. We adaptively construct semantic hierarchies and use semantic-independent and semantic-shared prompts to learn hierarchical representations. We also integrate attribute prompts and a prompt matching loss to enhance feature discrimination and employ decoupled attention for robustness and reduced inference costs. SHIP significantly improves performance, achieving a 4.9% gain in accuracy over VPT with a ViT-B/16 backbone on VTAB-1k tasks. Our code is available at https://github.com/haoweiz23/SHIP.

Problem

Research questions and friction points this paper is trying to address.

Visual Prompt Tuning

Hierarchical Prompts

Attention Structure Preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Hierarchical Prompting

Decoupled Attention Mechanism

Attribute Prompt

🔎 Similar Papers

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts