Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey

📅 2024-02-03
🏛️ arXiv.org
📈 Citations: 60
Influential: 2
📄 PDF
🤖 AI Summary
To address the computational and memory bottlenecks inherent in full-parameter fine-tuning of billion- or trillion-parameter vision foundation models, this work systematically investigates parameter-efficient fine-tuning (PEFT) methods for vision. We formally define vision PEFT for the first time and propose a unified taxonomy comprising three categories: additive (e.g., LoRA, Adapter), selective (e.g., BitFit), and unified (e.g., VPT, Prompt Tuning). Through comprehensive evaluation across diverse pretraining paradigms and cross-task generalization benchmarks, we survey state-of-the-art approaches, standard datasets, and critical open challenges. Our study establishes the most complete knowledge framework for vision PEFT to date, accompanied by an open-source repository covering over 100 works. This resource provides both a theoretical foundation and practical guidance for efficient vision transfer learning.

Technology Category

Application Category

📝 Abstract
Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks. However, with state-of-the-art PVMs growing to billions or even trillions of parameters, the standard full fine-tuning paradigm is becoming unsustainable due to high computational and storage demands. In response, researchers are exploring parameter-efficient fine-tuning (PEFT), which seeks to exceed the performance of full fine-tuning with minimal parameter modifications. This survey provides a comprehensive overview and future directions for visual PEFT, offering a systematic review of the latest advancements. First, we provide a formal definition of PEFT and discuss model pre-training methods. We then categorize existing methods into three categories: addition-based, partial-based, and unified-based. Finally, we introduce the commonly used datasets and applications and suggest potential future research challenges. A comprehensive collection of resources is available at https://github.com/synbol/Awesome-Parameter-Efficient-Transfer-Learning.
Problem

Research questions and friction points this paper is trying to address.

Addresses high computational costs in fine-tuning large vision models
Explores parameter-efficient methods to enhance model adaptability
Surveys and categorizes existing efficient fine-tuning techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient fine-tuning for vision models
Categorizes methods into addition, partial, unified
Reviews datasets, applications, future challenges
🔎 Similar Papers
No similar papers found.
Yi Xin
Yi Xin
California Institute of Technology
Industrial OrganizationEconometrics
Siqi Luo
Siqi Luo
Shanghai Jiao Tong university
AIGCComputer VisionImage EditingAI4Science
H
Haodi Zhou
Nanjing University
J
Junlong Du
Youtu Lab, Tencent
X
Xiaohong Liu
Shanghai Jiao Tong University
Y
Yue Fan
Beijing Institute for General Artificial Intelligence (BIGAI)
Q
Qing Li
Beijing Institute for General Artificial Intelligence (BIGAI)
Yuntao Du
Yuntao Du
Purdue University
Privacy