ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning

📅 2024-10-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work investigates the inheritance of out-of-distribution (OOD) robustness in pretrained models after fine-tuning on downstream tasks. Contrary to the common intuition that stronger pretraining yields better downstream performance, we find that richer and more diverse pretraining data often leads to *worse* OOD robustness after fine-tuning. To systematically study this phenomenon, we introduce ImageNet-RIB—the first benchmark for robustness inheritance—and propose an optimal transport-based distance metric to quantify distributional divergence between pretraining and downstream data; this metric effectively predicts post-fine-tuning robustness degradation. Through multi-task cross-fine-tuning and comprehensive generalization evaluation, we empirically demonstrate that standard fine-tuning consistently erodes OOD robustness, while continual learning strategies can mitigate this degradation. Our work provides a foundational benchmark, a theoretically grounded analytical tool, and empirical evidence to guide the design of robust fine-tuning methods.

Technology Category

Application Category

📝 Abstract

Highly performant large-scale pre-trained models promise to also provide a valuable foundation for learning specialized tasks, by fine-tuning the model to the desired task. By starting from a good general-purpose model, the goal is to achieve both specialization in the target task and maintain robustness. To assess the robustness of models on out-of-distribution samples after fine-tuning on downstream datasets, we introduce a new robust fine-tuning benchmark, ImageNet-RIB (Robustness Inheritance Benchmark). The benchmark consists of a set of related but distinct specialized (downstream) datasets; pre-trained models are fine-tuned on one dataset in the set and their robustness is assessed on the rest, iterating across all tasks for fine-tuning and assessment. The distance between the pre-training and downstream datasets, measured by optimal transport, predicts this performance degradation on the pre-training dataset. Though continual learning methods help maintain robustness, fine-tuning generally reduces generalization performance on related downstream tasks across models. Counterintuitively, model robustness after fine-tuning on related downstream tasks is the worst when the pre-training dataset is the richest and the most diverse. This suggests that starting with the strongest foundation model is not necessarily the best approach for performance on specialist tasks. ImageNet-RIB thus offers key insights for developing more resilient fine-tuning strategies and building robust machine learning models. https://jd730.github.io/projects/ImageNet-RIB

Problem

Research questions and friction points this paper is trying to address.

Assess robustness of models post-fine-tuning.

Introduce ImageNet-RIB for robust fine-tuning evaluation.

Explore impact of diverse pre-training on robustness.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ImageNet-RIB benchmark

Measures robustness via optimal transport

Challenges rich pre-training efficacy

🔎 Similar Papers

Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective