Cluster-Based Random Forest Visualization and Interpretation

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Random forests achieve strong predictive performance but suffer from poor interpretability; existing explanation methods struggle to balance global understanding with structural fidelity. This paper proposes a clustering-based explanation framework grounded in inter-tree distance. We introduce a novel distance metric that jointly incorporates decision rules and prediction consistency—avoiding both per-tree enumeration and excessive simplification. Complementing this, we design two coordinated visualizations: the Feature Plot, which depicts feature importance and split distributions, and the Rule Plot, which visualizes representative decision paths and rule coverage. Together, they enable intuitive exploration of both cluster-level patterns and individual tree behavior. Experiments on the Glass dataset and user studies demonstrate that our approach significantly enhances the revelation of internal forest structure, improving both the efficiency and accuracy of users’ comprehension of the ensemble’s global behavior.

Technology Category

Application Category

📝 Abstract
Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree, they are also harder to interpret. This paper presents a visualization method and system to increase interpretability of random forests. We cluster similar trees which enables users to interpret how the model performs in general without needing to analyze each individual decision tree in detail, or interpret an oversimplified summary of the full forest. To meaningfully cluster the decision trees, we introduce a new distance metric that takes into account both the decision rules as well as the predictions of a pair of decision trees. We also propose two new visualization methods that visualize both clustered and individual decision trees: (1) The Feature Plot, which visualizes the topological position of features in the decision trees, and (2) the Rule Plot, which visualizes the decision rules of the decision trees. We demonstrate the efficacy of our approach through a case study on the "Glass" dataset, which is a relatively complex standard machine learning dataset, as well as a small user study.
Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability of random forest models
Clustering similar trees to simplify model analysis
Visualizing decision rules and feature importance effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Clusters similar trees for interpretability
Introduces new distance metric for trees
Proposes Feature Plot and Rule Plot
🔎 Similar Papers
No similar papers found.
Max Sondag
Max Sondag
University of Köln
Computational Geometry(Geo)visualizationTreemaps
Christofer Meinecke
Christofer Meinecke
Leipzig University
Information VisualizationVisual AnalyticsDigital Humanities
D
Dennis Collaris
Eindhoven University of Technology, the Netherlands
T
Tatiana von Landesberger
University of Cologne, Germany
S
Stef van den Elzen
Eindhoven University of Technology, the Netherlands