🤖 AI Summary
Image anomaly localization is critical in medical diagnosis and industrial inspection, yet existing generative-model-based approaches—particularly diffusion models—lack statistical reliability, suffer from model bias and uncertainty, and fail to quantify false-positive risk. This paper introduces selective inference to diffusion-based anomaly localization for the first time, establishing an interpretable statistical inference framework: for each pixel or region in the model-reconstructed image, it performs conditional hypothesis testing and outputs rigorously calibrated p-values to quantify the false-positive probability. Unlike conventional methods lacking theoretical guarantees, our approach enables statistically controlled, significance-aware anomaly localization. Experiments on multiple medical and industrial datasets demonstrate substantial improvements in false-positive rate control, delivering trustworthy, statistically grounded anomaly localization outputs suitable for high-stakes applications.
📝 Abstract
Anomaly localization in images (identifying regions that deviate from expected patterns) is vital in applications such as medical diagnosis and industrial inspection. A recent trend is the use of image generation models in anomaly localization, where these models generate normal-looking counterparts of anomalous images, thereby allowing flexible and adaptive anomaly localization. However, these methods inherit the uncertainty and bias implicitly embedded in the employed generative model, raising concerns about the reliability. To address this, we propose a statistical framework based on selective inference to quantify the significance of detected anomalous regions. Our method provides $p$-values to assess the false positive detection rates, providing a principled measure of reliability. As a proof of concept, we consider anomaly localization using a diffusion model and its applications to medical diagnoses and industrial inspections. The results indicate that the proposed method effectively controls the risk of false positive detection, supporting its use in high-stakes decision-making tasks.