🤖 AI Summary
Industrial Anomaly Detection (IAD) faces a fundamental trade-off between high detection accuracy and low computational overhead: existing deep learning methods incur prohibitive training costs, while classical k-Nearest Neighbors (kNN) relies solely on off-the-shelf pretrained features and lacks spatial robustness. To address this, we propose Local Window Nearest Neighbors (LWinNN), a training-free approach grounded in a bounded translation-invariance assumption—striking a practical compromise between full invariance and no invariance. LWinNN performs feature matching within local windows, synergistically integrating pretrained representations with kNN-based retrieval, thereby enhancing spatial robustness without increasing inference complexity. Evaluated across multiple IAD benchmarks, LWinNN consistently outperforms classical kNN and several supervised/self-supervised baselines, achieving an average 3.2% improvement in detection accuracy and reducing training time by over 99%. These results validate the efficacy of its lightweight design in data-scarce industrial scenarios.
📝 Abstract
Industrial Anomaly Detection (IAD) is a subproblem within Computer Vision Anomaly Detection that has been receiving increasing amounts of attention due to its applicability to real-life scenarios. Recent research has focused on how to extract the most informative features, contrasting older kNN-based methods that use only pretrained features. These recent methods are much more expensive to train however and could complicate real-life application. Careful study of related work with regards to transformation invariance leads to the idea that popular benchmarks require robustness to only minor translations. With this idea we then formulate LWinNN, a local window based approach that creates a middle ground between kNN based methods that have either complete or no translation invariance. Our experiments demonstrate that this small change increases accuracy considerably, while simultaneously decreasing both train and test time. This teaches us two things: first, the gap between kNN-based approaches and more complex state-of-the-art methodology can still be narrowed by effective usage of the limited data available. Second, our assumption of requiring only limited translation invariance highlights potential areas of interest for future work and the need for more spatially diverse benchmarks, for which our method can hopefully serve as a new baseline. Our code can be found at https://github.com/marietteschonfeld/LWinNN .