🤖 AI Summary
Tree ensemble classifiers (e.g., Random Forests, XGBoost) suffer from poor interpretability due to their vast rule sets (often tens of thousands), leading to overlooked critical anomalous rules and fidelity loss during model simplification. To address these issues, this paper proposes a hierarchical interpretability framework: (1) it constructs a semantics-driven hierarchical rule organization—replacing flat reduction—with rules grouped by semantic similarity; (2) it introduces an anomaly-deviation-guided stratified sampling strategy that prioritizes preservation of rules significantly deviating from dominant patterns at each hierarchy level; and (3) it implements a matrix-style hierarchical interactive visualization enabling multi-granularity exploration—from global overviews to local rule inspection. Experiments on multiple real-world datasets demonstrate that the framework maintains high fidelity while substantially improving dual coverage of both common and anomalous rules, thereby enhancing domain experts’ deep understanding of model decision logic.
📝 Abstract
The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequency, play crucial roles in real-world applications. This paper introduces a scalable visual analysis method to explain tree ensemble classifiers that contain tens of thousands of rules. The key idea is to address the issue of losing fidelity by adaptively organizing the rules as a hierarchy rather than reducing them. To ensure the inclusion of anomalous rules, we develop an anomaly-biased model reduction method to prioritize these rules at each hierarchical level. Synergized with this hierarchical organization of rules, we develop a matrix-based hierarchical visualization to support exploration at different levels of detail. Our quantitative experiments and case studies demonstrate how our method fosters a deeper understanding of both common and anomalous rules, thereby enhancing interpretability without sacrificing comprehensiveness.