🤖 AI Summary
Isolation Forest (iForest) suffers from a lack of global interpretability due to its black-box nature. Method: This paper proposes the first global explanation framework for ensemble tree models, introducing a Decision Predicate Graph (DPG) to abstract iForest’s ensemble logic, designing the Inlier-Outlier Propagation Score (IOP-Score) to quantify feature-wise collaborative contributions along inlier-to-outlier propagation paths, and leveraging graph neural network–inspired propagation modeling for feature attribution. Contributions/Results: (1) It reveals, for the first time, the boundary evolution mechanism and feature collaboration underlying outlier detection in iForest; (2) it achieves end-to-end global interpretability—from local predictions to ensemble decision logic, and from output results to underlying inference processes; (3) empirical evaluation across multiple benchmark datasets demonstrates that IOP-Score accurately identifies discriminative features, significantly enhancing model transparency and trustworthiness.
📝 Abstract
The need to explain predictive models is well-established in modern machine learning. However, beyond model interpretability, understanding pre-processing methods is equally essential. Understanding how data modifications impact model performance improvements and potential biases and promoting a reliable pipeline is mandatory for developing robust machine learning solutions. Isolation Forest (iForest) is a widely used technique for outlier detection that performs well. Its effectiveness increases with the number of tree-based learners. However, this also complicates the explanation of outlier selection and the decision boundaries for inliers. This research introduces a novel Explainable AI (XAI) method, tackling the problem of global explainability. In detail, it aims to offer a global explanation for outlier detection to address its opaque nature. Our approach is based on the Decision Predicate Graph (DPG), which clarifies the logic of ensemble methods and provides both insights and a graph-based metric to explain how samples are identified as outliers using the proposed Inlier-Outlier Propagation Score (IOP-Score). Our proposal enhances iForest's explainability and provides a comprehensive view of the decision-making process, detailing which features contribute to outlier identification and how the model utilizes them. This method advances the state-of-the-art by providing insights into decision boundaries and a comprehensive view of holistic feature usage in outlier identification. -- thus promoting a fully explainable machine learning pipeline.