🤖 AI Summary
This work addresses the contextual nature of anomalies, which existing methods often overlook by treating abnormality as an intrinsic property of objects—ignoring that the same entity (e.g., a person running) may be normal in one context (a track) but anomalous in another (a highway). To this end, we present the first systematic study of context-dependent anomaly detection, introducing a conditional compatibility learning framework that models the compatibility between foreground entities and their surrounding contexts through vision-language representations, enabling weakly supervised detection. We further construct CAAD-3K, a novel benchmark dataset that controls for object identity while varying only the contextual background, along with a context-controllable data generation strategy. Experiments demonstrate that our approach significantly outperforms existing methods on CAAD-3K and achieves state-of-the-art performance on MVTec-AD and VisA, confirming the efficacy of explicit context modeling as a complementary paradigm for structural anomaly detection.
📝 Abstract
Anomaly detection is often formulated under the assumption that abnormality is an intrinsic property of an observation, independent of context. This assumption breaks down in many real-world settings, where the same object or action may be normal or anomalous depending on latent contextual factors (e.g., running on a track versus on a highway). We revisit \emph{contextual anomaly detection}, classically defined as context-dependent abnormality, and operationalize it in the visual domain, where anomaly labels depend on subject--context compatibility rather than intrinsic appearance. To enable systematic study of this setting, we introduce CAAD-3K, a benchmark that isolates contextual anomalies by controlling subject identity while varying context. We further propose a conditional compatibility learning framework that leverages vision--language representations to model subject--context relationships under limited supervision. Our method substantially outperforms existing approaches on CAAD-3K and achieves state-of-the-art performance on MVTec-AD and VisA, demonstrating that modeling context dependence complements traditional structural anomaly detection. Our code and dataset will be publicly released.