On the Domain Robustness of Contrastive Vision-Language Models

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale pre-trained vision-language models (VLMs) suffer significant robustness degradation under domain shifts—such as atypical imaging conditions or environmental changes—yet existing evaluation protocols fail to model realistic, scenario-specific degradations. Method: We propose Deepbench, the first unsupervised robustness evaluation framework tailored for domain transfer. It leverages large language models (LLMs) to generate context-aware, human-annotation-free image perturbations spanning six real-world degradation categories. Deepbench uniquely integrates LLM-driven zero-shot degradation synthesis with contrastive VLM evaluation. Contribution/Results: Experiments reveal substantial cross-domain robustness disparities among mainstream VLMs, with performance fluctuations exceeding 40% under different degradation types. Deepbench’s code and benchmark are publicly released, advancing VLM reliability assessment from generic benchmarking toward deployment-oriented evaluation.

Technology Category

Application Category

📝 Abstract
In real-world vision-language applications, practitioners increasingly rely on large, pretrained foundation models rather than custom-built solutions, despite limited transparency regarding their training data and processes. While these models achieve impressive performance on general benchmarks, their effectiveness can decline notably under specialized domain shifts, such as unique imaging conditions or environmental variations. In this work, we introduce Deepbench, a framework designed to assess domain-specific robustness of vision-language models (VLMs). Deepbench leverages a large language model (LLM) to generate realistic, context-aware image corruptions tailored to specific deployment domains without requiring labeled data. We evaluate a range of contrastive vision-language architectures and architectural variants across six real-world domains and observe substantial variability in robustness, highlighting the need for targeted, domain-aware evaluation. Deepbench is released as open-source software to support further research into domain-aware robustness assessment.
Problem

Research questions and friction points this paper is trying to address.

Assessing domain robustness of vision-language models
Evaluating performance under specialized domain shifts
Generating realistic corruptions for domain-specific testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM generates realistic image corruptions
No labeled data required for evaluation
Open-source framework for domain robustness
🔎 Similar Papers
No similar papers found.
Mario Koddenbrock
Mario Koddenbrock
PHD Student at HTW Berlin
Machine LearningComputer Vision
R
Rudolf Hoffmann
KI-Werkstatt/Fachbereich 2, University of Applied Sciences Berlin, Wilhelminenhofstr. 75A, 12459 Berlin, Germany
D
David Brodmann
KI-Werkstatt/Fachbereich 2, University of Applied Sciences Berlin, Wilhelminenhofstr. 75A, 12459 Berlin, Germany
Erik Rodner
Erik Rodner
University of Applied Sciences (HTW Berlin)
computer visionmachine learningtime series analysis