🤖 AI Summary
Isolation Forest (iForest), widely adopted for efficient unsupervised anomaly detection, lacks rigorous theoretical foundations explaining its effectiveness. This paper bridges that gap by modeling iForest’s tree-growing process as a stochastic random walk and deriving, for the first time, the exact expected depth function. Through this formalism, we systematically characterize iForest’s inductive bias—revealing its insensitivity to anomalies near data centers and its robustness to hyperparameter choices. By analyzing transition probabilities and contrasting with k-NN-based detectors, we establish the first interpretable theoretical framework for iForest, quantitatively characterizing its inherent bias mechanism. Our analysis not only resolves the long-standing absence of theoretical grounding for iForest but also provides principled, quantitative insights to guide the design of novel anomaly detection algorithms that are both interpretable and robust. (149 words)
📝 Abstract
Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector valued for its exceptional runtime efficiency and performance on large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper theoretically investigates the conditions and extent of iForest's effectiveness by analyzing its inductive bias through the formulation of depth functions and growth processes. Since directly analyzing the depth function proves intractable due to iForest's random splitting mechanism, we model the growth process of iForest as a random walk, enabling us to derive the expected depth function using transition probabilities. Our case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor anomaly detectors. Our study provides theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.