🤖 AI Summary
To address weak generalization and insufficient robustness of image anomaly detection under unknown and challenging anomaly scenarios, this paper proposes a data-driven robust anomaly detection framework. Methodologically, it introduces: (1) a novel text-image cross-modal guided adaptive anomaly exposure mechanism, overcoming limitations of conventional open-set recognition methods in distribution coverage and semantic plausibility; (2) a controllable out-of-distribution (OOD) sample synthesis pipeline built upon Stable Diffusion, integrating adversarial training and feature-space constraints to generate high-quality OOD samples with diversity, conceptual separability, and distributional proximity; and (3) end-to-end co-optimization of the generator and detector. Evaluated on multiple benchmarks, the framework achieves up to 12.7% AUC improvement, significantly enhancing detection rates for unknown-class anomalies while maintaining low false-positive rates. Qualitative analysis confirms that generated samples exhibit clear semantics, visual fidelity, and boundary sensitivity.
📝 Abstract
In recent years, there have been significant improvements in various forms of image outlier detection. However, outlier detection performance under adversarial settings lags far behind that in standard settings. This is due to the lack of effective exposure to adversarial scenarios during training, especially on unseen outliers, leading to detection models failing to learn robust features. To bridge this gap, we introduce RODEO, a data-centric approach that generates effective outliers for robust outlier detection. More specifically, we show that incorporating outlier exposure (OE) and adversarial training can be an effective strategy for this purpose, as long as the exposed training outliers meet certain characteristics, including diversity, and both conceptual differentiability and analogy to the inlier samples. We leverage a text-to-image model to achieve this goal. We demonstrate both quantitatively and qualitatively that our adaptive OE method effectively generates ``diverse'' and ``near-distribution'' outliers, leveraging information from both text and image domains. Moreover, our experimental results show that utilizing our synthesized outliers significantly enhances the performance of the outlier detector, particularly in adversarial settings.