🤖 AI Summary
This paper addresses the unsupervised detection of local density anomalies—such as over-dense or under-dense regions—in multivariate data. We propose EagleEye, a novel method that models the label sequence within each sample’s neighborhood as a coin-flip process and identifies statistically significant deviations via sequential rank-based hypothesis testing. Our contributions include: (i) the first coin-flip statistical modeling framework based on neighborhood label sequences; (ii) the first prior-free, nonparametric estimator of local signal purity; and (iii) a design that is conceptually simple, inherently parallelizable, and scalable to high dimensions. Evaluations demonstrate EagleEye’s effectiveness: it detects faint multidimensional anomalies comprising only 0.1% of synthetic data; identifies sparse resonance decay events (0.3% prevalence) in Large Hadron Collider (LHC) simulation data; and uncovers previously unreported regional temperature abrupt shifts in global climate records—validating its robustness and scientific discovery capability.
📝 Abstract
Detecting localized density differences in multivariate data is a crucial task in computational science. Such anomalies can indicate a critical system failure, lead to a groundbreaking scientific discovery, or reveal unexpected changes in data distribution. We introduce EagleEye, an anomaly detection method to compare two multivariate datasets with the aim of identifying local density anomalies, namely over- or under-densities affecting only localised regions of the feature space. Anomalies are detected by modelling, for each point, the ordered sequence of its neighbours' membership label as a coin-flipping process and monitoring deviations from the expected behaviour of such process. A unique advantage of our method is its ability to provide an accurate, entirely unsupervised estimate of the local signal purity. We demonstrate its effectiveness through experiments on both synthetic and real-world datasets. In synthetic data, EagleEye accurately detects anomalies in multiple dimensions even when they affect a tiny fraction of the data. When applied to a challenging resonant anomaly detection benchmark task in simulated Large Hadron Collider data, EagleEye successfully identifies particle decay events present in just 0.3% of the dataset. In global temperature data, EagleEye uncovers previously unidentified, geographically localised changes in temperature fields that occurred in the most recent years. Thanks to its key advantages of conceptual simplicity, computational efficiency, trivial parallelisation, and scalability, EagleEye is widely applicable across many fields.