In Search of Forgotten Domain Generalization

📅 2024-10-10

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Current out-of-distribution (OOD) generalization evaluation is vulnerable to test-domain contamination, leading to inflated performance estimates for large vision-language models. Method: We introduce LAION-Natural and LAION-Rendition—the first large-scale, strictly style-isolated OOD benchmark—enabling Web-scale, style-pure evaluation separating natural photographs from synthetic renderings. Leveraging CLIP-based models, we conduct attribution analysis and systematic ablation studies with controlled domain mixing. Results: We expose significant overestimation of OOD generalization in conventional ImageNet-style benchmarks; demonstrate that Web-scale pretraining exacerbates this illusion; confirm that models remain heavily reliant on in-distribution samples; and identify a 1:1 natural-to-rendition mixing ratio that consistently improves cross-domain accuracy by 3.2%. This work delivers a reproducible benchmark, reveals fundamental bottlenecks in real-world OOD robustness of foundation models, and provides empirically grounded guidance for data-mixing strategies.

Technology Category

Application Category

📝 Abstract

Out-of-Domain (OOD) generalization is the ability of a model trained on one or more domains to generalize to unseen domains. In the ImageNet era of computer vision, evaluation sets for measuring a model's OOD performance were designed to be strictly OOD with respect to style. However, the emergence of foundation models and expansive web-scale datasets has obfuscated this evaluation process, as datasets cover a broad range of domains and risk test domain contamination. In search of the forgotten domain generalization, we create large-scale datasets subsampled from LAION -- LAION-Natural and LAION-Rendition -- that are strictly OOD to corresponding ImageNet and DomainNet test sets in terms of style. Training CLIP models on these datasets reveals that a significant portion of their performance is explained by in-domain examples. This indicates that the OOD generalization challenges from the ImageNet era still prevail and that training on web-scale data merely creates the illusion of OOD generalization. Furthermore, through a systematic exploration of combining natural and rendition datasets in varying proportions, we identify optimal mixing ratios for model generalization across these domains. Our datasets and results re-enable meaningful assessment of OOD robustness at scale -- a crucial prerequisite for improving model robustness.

Problem

Research questions and friction points this paper is trying to address.

Assessing OOD generalization in foundation models

Identifying test domain contamination in web-scale datasets

Optimizing dataset mixing ratios for cross-domain generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Created strictly OOD datasets from LAION

Trained CLIP models to assess OOD performance

Identified optimal dataset mixing ratios

🔎 Similar Papers

No similar papers found.