Visual Prompt-Agnostic Evolution

๐Ÿ“… 2026-01-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of existing Visual Prompt Tuning (VPT) methods, which often suffer from gradient oscillations, premature convergence in shallow layers, and high variance in deeper layers, leading to cross-layer inconsistency, slow convergence, and degraded performance. To overcome these issues, we propose a lightweight and general-purpose enhancement framework that initializes task-aware prompt directions via frequency-domain analysis, employs a globally shared Koopman operator to model the dynamic evolution of prompts for consistent cross-layer updates, and incorporates a Lyapunov stability theoryโ€“inspired regularizer to suppress error amplification. Notably, our approach requires no modifications to the backbone network or inference pipeline and is compatible with various VPT variants. Experiments across 25 downstream tasks demonstrate an average 1.41ร— acceleration in convergence and a 1โ€“3% improvement in accuracy.

Technology Category

Application Category

๐Ÿ“ Abstract
Visual Prompt Tuning (VPT) adapts a frozen Vision Transformer (ViT) to downstream tasks by inserting a small number of learnable prompt tokens into the token sequence at each layer. However, we observe that existing VPT variants often suffer from unstable training dynamics, characterized by gradient oscillations. A layer-wise analysis reveals that shallow-layer prompts tend to stagnate early, while deeper-layer prompts exhibit high-variance oscillations, leading to cross-layer mismatch. These issues slow convergence and degrade final performance. To address these challenges, we propose Prompt-Agnostic Evolution ($\mathtt{PAE}$), which strengthens vision prompt tuning by explicitly modeling prompt dynamics. From a frequency-domain perspective, we initialize prompts in a task-aware direction by uncovering and propagating frequency shortcut patterns that the backbone inherently exploits for recognition. To ensure coherent evolution across layers, we employ a shared Koopman operator that imposes a global linear transformation instead of uncoordinated, layer-specific updates. Finally, inspired by Lyapunov stability theory, we introduce a regularizer that constrains error amplification during evolution. Extensive experiments show that $\mathtt{PAE}$ accelerates convergence with an average $1.41\times$ speedup and improves accuracy by 1-3% on 25 datasets across multiple downstream tasks. Beyond performance, $\mathtt{PAE}$ is prompt-agnostic and lightweight, and it integrates seamlessly with diverse VPT variants without backbone modification or inference-time changes.
Problem

Research questions and friction points this paper is trying to address.

Visual Prompt Tuning
training instability
gradient oscillations
cross-layer mismatch
convergence slowdown
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-Agnostic Evolution
Visual Prompt Tuning
Koopman Operator
Frequency-domain Initialization
Lyapunov Stability
๐Ÿ”Ž Similar Papers
No similar papers found.