Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prevalent cross-task concept forgetting in fine-tuning vision foundation models, this paper proposes a neighborhood graph-based feature distribution alignment method. Our approach explicitly models structural distribution shifts by constructing k-nearest-neighbor (k-NN) graphs in both pre-trained and fine-tuned feature spaces. It dynamically generates discriminative proxy points to enhance intra-class compactness and inter-class separability, thereby improving structural preservation. Furthermore, we introduce a graph-guided distribution alignment regularization term, compatible with diverse fine-tuning paradigms—including end-to-end, few-shot, and continual learning. Extensive experiments demonstrate that our method significantly mitigates concept forgetting across image classification, image captioning, and visual question answering tasks. It consistently improves cross-task generalization performance, and quantitative analysis reveals a strong correlation between forgetting magnitude and feature distribution distance.

Technology Category

Application Category

📝 Abstract
Vision foundation models pre-trained on massive data encode rich representations of real-world concepts, which can be adapted to downstream tasks by fine-tuning. However, fine-tuning foundation models on one task often leads to the issue of concept forgetting on other tasks. Recent methods of robust fine-tuning aim to mitigate forgetting of prior knowledge without affecting the fine-tuning performance. Knowledge is often preserved by matching the original and fine-tuned model weights or feature pairs. However, such point-wise matching can be too strong, without explicit awareness of the feature neighborhood structures that encode rich knowledge as well. We propose a novel regularization method Proxy-FDA that explicitly preserves the structural knowledge in feature space. Proxy-FDA performs Feature Distribution Alignment (using nearest neighbor graphs) between the pre-trained and fine-tuned feature spaces, and the alignment is further improved by informative proxies that are generated dynamically to increase data diversity. Experiments show that Proxy-FDA significantly reduces concept forgetting during fine-tuning, and we find a strong correlation between forgetting and a distributional distance metric (in comparison to L2 distance). We further demonstrate Proxy-FDA's benefits in various fine-tuning settings (end-to-end, few-shot and continual tuning) and across different tasks like image classification, captioning and VQA.
Problem

Research questions and friction points this paper is trying to address.

Mitigates concept forgetting in fine-tuned vision foundation models
Aligns feature distributions using dynamic proxies and neighborhood structures
Improves performance across image classification, captioning, and VQA tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proxy-FDA aligns feature distributions using nearest neighbor graphs
Dynamic proxies enhance data diversity for better alignment
Reduces concept forgetting in various fine-tuning settings
🔎 Similar Papers
No similar papers found.