Robust Adaptation of Foundation Models with Black-Box Visual Prompting

📅 2024-07-04

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Black-box large models (e.g., API-hosted pre-trained models) are inaccessible in terms of parameters and architecture, and suffer from severe GPU memory constraints. Method: We propose BlackVIP, a vision-language prompting method requiring no internal model information. It employs an input-dependent prompt generation mechanism coupled with SPSA-GC-based gradient estimation, enabling memory-efficient, backpropagation-free adaptation. We further introduce BlackVIP-SE—a lightweight variant—and establish, for the first time, a theoretical connection between visual prompting and randomized smoothing robustness, formally explaining its improved adversarial robustness. Contribution/Results: Evaluated across 19 cross-domain datasets, BlackVIP significantly reduces GPU memory consumption and computational overhead while enhancing out-of-distribution generalization and adversarial robustness—without accessing model internals or gradients.

Technology Category

Application Category

📝 Abstract

With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention. While promising, they commonly rely on two optimistic assumptions: 1) full access to the parameters of a PTM, and 2) sufficient memory capacity to cache all intermediate activations for gradient computation. However, in most real-world applications, PTMs serve as black-box APIs or proprietary software without full parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. This work proposes black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge of their architectures or parameters. BlackVIP has two components: 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent visual prompts, which allow the target PTM to adapt in the wild. SPSA-GC efficiently estimates the gradient of PTM to update Coordinator. Besides, we introduce a variant, BlackVIP-SE, which significantly reduces the runtime and computational cost of BlackVIP. Extensive experiments on 19 datasets demonstrate that BlackVIPs enable robust adaptation to diverse domains and tasks with minimal memory requirements. We further provide a theoretical analysis on the generalization of visual prompting methods by presenting their connection to the certified robustness of randomized smoothing, and presenting an empirical support for improved robustness.

Problem

Research questions and friction points this paper is trying to address.

Adapts large pre-trained models without accessing their parameters

Reduces memory requirements for efficient model adaptation

Enhances robustness across diverse domains and tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box visual prompting for model adaptation

Coordinator designs input-dependent visual prompts

SPSA-GC efficiently estimates gradients without parameters

🔎 Similar Papers

No similar papers found.