CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of evaluating image classifier robustness under realistic, continuously varying out-of-distribution (OOD) nuisances. To this end, we introduce CNS-Bench—the first benchmark supporting continuous-intensity nuisance shifts—built upon diffusion models fine-tuned with LoRA adapters for controllable perturbation generation, augmented by a semantic consistency filtering mechanism to ensure perturbation fidelity. The benchmark comprises a high-quality OOD test suite spanning diverse nuisance types and intensities, enabling fine-grained robustness analysis. Extensive evaluation across 40+ state-of-the-art classifiers reveals, for the first time, dynamic performance reordering as a function of nuisance type and intensity. Moreover, our continuous assessment uncovers critical failure thresholds invisible to conventional binary OOD detection, thereby establishing a more precise and interpretable quantitative paradigm for OOD robustness evaluation.

Technology Category

Application Category

📝 Abstract
An important challenge when using computer vision models in the real world is to evaluate their performance in potential out-of-distribution (OOD) scenarios. While simple synthetic corruptions are commonly applied to test OOD robustness, they often fail to capture nuisance shifts that occur in the real world. Recently, diffusion models have been applied to generate realistic images for benchmarking, but they are restricted to binary nuisance shifts. In this work, we introduce CNS-Bench, a Continuous Nuisance Shift Benchmark to quantify OOD robustness of image classifiers for continuous and realistic generative nuisance shifts. CNS-Bench allows generating a wide range of individual nuisance shifts in continuous severities by applying LoRA adapters to diffusion models. To address failure cases, we propose a filtering mechanism that outperforms previous methods, thereby enabling reliable benchmarking with generative models. With the proposed benchmark, we perform a large-scale study to evaluate the robustness of more than 40 classifiers under various nuisance shifts. Through carefully designed comparisons and analyses, we find that model rankings can change for varying shifts and shift scales, which cannot be captured when applying common binary shifts. Additionally, we show that evaluating the model performance on a continuous scale allows the identification of model failure points, providing a more nuanced understanding of model robustness. Project page including code and data: https://genintel.github.io/CNS.
Problem

Research questions and friction points this paper is trying to address.

Evaluating image classifier robustness under continuous real-world nuisance shifts
Generating realistic continuous nuisance shifts for reliable OOD benchmarking
Identifying model failure points across varying shift scales and types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LoRA adapters for continuous nuisance shifts
Proposes filtering mechanism for reliable benchmarking
Benchmarks 40+ classifiers under various shifts
🔎 Similar Papers
No similar papers found.