🤖 AI Summary
To address the lack of statistical reliability in autoencoder-based anomaly detection after domain adaptation—particularly when target-domain samples are scarce and adaptation introduces uncertainty, hindering valid p-value computation and false positive rate (FPR) control—this paper proposes STAND-DA. STAND-DA is the first framework to systematically integrate selective inference into deep anomaly detection post-domain adaptation. Building upon representation-learning-based domain adaptation, it provides rigorous statistical inference for autoencoder reconstructions, enabling closed-form p-value computation and theoretical FPR control. The method constructs a provably valid test statistic and develops a GPU-accelerated algorithm to enhance scalability for large models. Experiments on synthetic and real-world benchmarks demonstrate STAND-DA’s statistical validity, strict FPR control under finite target samples, and substantial computational speedup over baseline approaches.
📝 Abstract
Anomaly detection (AD) plays a vital role across a wide range of domains, but its performance might deteriorate when applied to target domains with limited data. Domain Adaptation (DA) offers a solution by transferring knowledge from a related source domain with abundant data. However, this adaptation process can introduce additional uncertainty, making it difficult to draw statistically valid conclusions from AD results. In this paper, we propose STAND-DA -- a novel framework for statistically rigorous Autoencoder-based AD after Representation Learning-based DA. Built on the Selective Inference (SI) framework, STAND-DA computes valid $p$-values for detected anomalies and rigorously controls the false positive rate below a pre-specified level $α$ (e.g., 0.05). To address the computational challenges of applying SI to deep learning models, we develop the GPU-accelerated SI implementation, significantly enhancing both scalability and runtime performance. This advancement makes SI practically feasible for modern, large-scale deep architectures. Extensive experiments on synthetic and real-world datasets validate the theoretical results and computational efficiency of the proposed STAND-DA method.