🤖 AI Summary
Random forests achieve strong predictive performance but suffer from poor interpretability; existing explanation methods struggle to balance global understanding with structural fidelity. This paper proposes a clustering-based explanation framework grounded in inter-tree distance. We introduce a novel distance metric that jointly incorporates decision rules and prediction consistency—avoiding both per-tree enumeration and excessive simplification. Complementing this, we design two coordinated visualizations: the Feature Plot, which depicts feature importance and split distributions, and the Rule Plot, which visualizes representative decision paths and rule coverage. Together, they enable intuitive exploration of both cluster-level patterns and individual tree behavior. Experiments on the Glass dataset and user studies demonstrate that our approach significantly enhances the revelation of internal forest structure, improving both the efficiency and accuracy of users’ comprehension of the ensemble’s global behavior.
📝 Abstract
Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree, they are also harder to interpret. This paper presents a visualization method and system to increase interpretability of random forests. We cluster similar trees which enables users to interpret how the model performs in general without needing to analyze each individual decision tree in detail, or interpret an oversimplified summary of the full forest. To meaningfully cluster the decision trees, we introduce a new distance metric that takes into account both the decision rules as well as the predictions of a pair of decision trees. We also propose two new visualization methods that visualize both clustered and individual decision trees: (1) The Feature Plot, which visualizes the topological position of features in the decision trees, and (2) the Rule Plot, which visualizes the decision rules of the decision trees. We demonstrate the efficacy of our approach through a case study on the "Glass" dataset, which is a relatively complex standard machine learning dataset, as well as a small user study.