VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

πŸ“… 2026-03-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of detecting logical-rule violations in industrial visual inspection under complex real-world conditions such as cluttered backgrounds, illumination variations, and motion blurβ€”factors that hinder existing methods. To this end, the authors introduce VID-AD, a novel dataset comprising 10 manufacturing scenarios under five imaging conditions, yielding 50 one-class tasks with 10,395 images. Each scenario explicitly defines two logical constraints and provides the first benchmark where logical states remain fixed while visual perturbations are systematically controlled. The work further proposes a language-driven anomaly detection framework that leverages only normal samples to generate textual descriptions and incorporates contradictory negative samples via contrastive learning to enhance sensitivity to logical attributes. Experiments demonstrate that the proposed method significantly outperforms current baselines across multiple settings, effectively identifying anomalies at the logical level rather than relying on low-level visual discrepancies.

Technology Category

Application Category

πŸ“ Abstract
Logical anomaly detection in industrial inspection remains challenging due to variations in visual appearance (e.g., background clutter, illumination shift, and blur), which often distract vision-centric detectors from identifying rule-level violations. However, existing benchmarks rarely provide controlled settings where logical states are fixed while such nuisance factors vary. To address this gap, we introduce VID-AD, a dataset for logical anomaly detection under vision-induced distraction. It comprises 10 manufacturing scenarios and five capture conditions, totaling 50 one-class tasks and 10,395 images. Each scenario is defined by two logical constraints selected from quantity, length, type, placement, and relation, with anomalies including both single-constraint and combined violations. We further propose a language-based anomaly detection framework that relies solely on text descriptions generated from normal images. Using contrastive learning with positive texts and contradiction-based negative texts synthesized from these descriptions, our method learns embeddings that capture logical attributes rather than low-level features. Extensive experiments demonstrate consistent improvements over baselines across the evaluated settings. The dataset is available at: https://github.com/nkthiroto/VID-AD.
Problem

Research questions and friction points this paper is trying to address.

logical anomaly detection
vision-induced distraction
industrial inspection
visual appearance variation
rule-level violations
Innovation

Methods, ideas, or system contributions that make the work stand out.

logical anomaly detection
vision-induced distraction
language-based anomaly detection
contrastive learning
industrial inspection
πŸ”Ž Similar Papers
No similar papers found.
H
Hiroto Nakata
Department of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui-city, 910-8507, Japan
Y
Yawen Zou
Graduate School of Science and Engineering, University of Toyama, 3190 Gofuku, Toyama-city, 930-8555, Japan
Shunsuke Sakai
Shunsuke Sakai
University of Fukui
Computer VisionNeural NetworksAnomaly Detection
S
Shun Maeda
Department of Engineering, University of Fukui, 3-9-1 Bunkyo, Fukui-city, 910-8507, Japan
Chunzhi Gu
Chunzhi Gu
Toyohashi University of Technology
Visual Data ComputingPattern Recognition
Y
Yijin Wei
Graduate School of Science and Engineering, University of Toyama, 3190 Gofuku, Toyama-city, 930-8555, Japan
S
Shangce Gao
Faculty of Engineering, University of Toyama, 3190 Gofuku, Toyama-city, 930-8555, Japan
Chao Zhang
Chao Zhang
Specially Appointed Professor, University of Toyama
Computer VisionAnomaly Detection