On the Domain Robustness of Contrastive Vision-Language Models

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Large-scale pre-trained vision-language models (VLMs) suffer significant robustness degradation under domain shifts—such as atypical imaging conditions or environmental changes—yet existing evaluation protocols fail to model realistic, scenario-specific degradations. Method: We propose Deepbench, the first unsupervised robustness evaluation framework tailored for domain transfer. It leverages large language models (LLMs) to generate context-aware, human-annotation-free image perturbations spanning six real-world degradation categories. Deepbench uniquely integrates LLM-driven zero-shot degradation synthesis with contrastive VLM evaluation. Contribution/Results: Experiments reveal substantial cross-domain robustness disparities among mainstream VLMs, with performance fluctuations exceeding 40% under different degradation types. Deepbench’s code and benchmark are publicly released, advancing VLM reliability assessment from generic benchmarking toward deployment-oriented evaluation.

Technology Category

Application Category

📝 Abstract

In real-world vision-language applications, practitioners increasingly rely on large, pretrained foundation models rather than custom-built solutions, despite limited transparency regarding their training data and processes. While these models achieve impressive performance on general benchmarks, their effectiveness can decline notably under specialized domain shifts, such as unique imaging conditions or environmental variations. In this work, we introduce Deepbench, a framework designed to assess domain-specific robustness of vision-language models (VLMs). Deepbench leverages a large language model (LLM) to generate realistic, context-aware image corruptions tailored to specific deployment domains without requiring labeled data. We evaluate a range of contrastive vision-language architectures and architectural variants across six real-world domains and observe substantial variability in robustness, highlighting the need for targeted, domain-aware evaluation. Deepbench is released as open-source software to support further research into domain-aware robustness assessment.

Problem

Research questions and friction points this paper is trying to address.

Assessing domain robustness of vision-language models

Evaluating performance under specialized domain shifts

Generating realistic corruptions for domain-specific testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM generates realistic image corruptions

No labeled data required for evaluation

Open-source framework for domain robustness

🔎 Similar Papers

Exploring Transferability of Multimodal Adversarial Samples for Vision-Language Pre-training Models with Contrastive Learning