Prompt-based Adaptation in Large-scale Vision Models: A Survey

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Ambiguity and terminological inconsistency between visual prompting (VP) and visual prompt tuning (VPT) hinder the systematic advancement of prompt-based adaptation methods. To address this, we propose Prompt-based Adaptation (PA), the first unified conceptual framework that rigorously distinguishes VP—learnable prompts injected at the input level—from VPT—parameter-efficient fine-tuning of internal model weights. PA establishes a multidimensional taxonomy spanning learnable, generative, and fixed prompts, as well as pixel-level and token-level injection granularities. Synthesizing advances across medical imaging, 3D point clouds, and vision-language modeling, we integrate insights from prompt engineering, parameter-efficient fine-tuning, test-time adaptation, and trustworthy AI. This survey clarifies conceptual boundaries, delivers the first comprehensive methodological landscape of PA, and provides a structured research roadmap—thereby enabling principled, lightweight visual model adaptation and facilitating cross-domain deployment.

Technology Category

Application Category

📝 Abstract

In computer vision, Visual Prompting (VP) and Visual Prompt Tuning (VPT) have recently emerged as lightweight and effective alternatives to full fine-tuning for adapting large-scale vision models within the ``pretrain-then-finetune'' paradigm. However, despite rapid progress, their conceptual boundaries remain blurred, as VP and VPT are frequently used interchangeably in current research, reflecting a lack of systematic distinction between these techniques and their respective applications. In this survey, we revisit the designs of VP and VPT from first principles, and conceptualize them within a unified framework termed Prompt-based Adaptation (PA). We provide a taxonomy that categorizes existing methods into learnable, generative, and non-learnable prompts, and further organizes them by injection granularity -- pixel-level and token-level. Beyond the core methodologies, we examine PA's integrations across diverse domains, including medical imaging, 3D point clouds, and vision-language tasks, as well as its role in test-time adaptation and trustworthy AI. We also summarize current benchmarks and identify key challenges and future directions. To the best of our knowledge, we are the first comprehensive survey dedicated to PA's methodologies and applications in light of their distinct characteristics. Our survey aims to provide a clear roadmap for researchers and practitioners in all area to understand and explore the evolving landscape of PA-related research.

Problem

Research questions and friction points this paper is trying to address.

Clarifying conceptual boundaries between Visual Prompting and Visual Prompt Tuning

Systematically categorizing prompt-based adaptation methods across domains

Providing unified framework for lightweight adaptation of vision models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for prompt-based adaptation

Taxonomy categorizing learnable generative non-learnable prompts

Examining integrations across diverse application domains

🔎 Similar Papers

No similar papers found.