Cluster-Based Random Forest Visualization and Interpretation

📅 2025-07-30

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Random forests achieve strong predictive performance but suffer from poor interpretability; existing explanation methods struggle to balance global understanding with structural fidelity. This paper proposes a clustering-based explanation framework grounded in inter-tree distance. We introduce a novel distance metric that jointly incorporates decision rules and prediction consistency—avoiding both per-tree enumeration and excessive simplification. Complementing this, we design two coordinated visualizations: the Feature Plot, which depicts feature importance and split distributions, and the Rule Plot, which visualizes representative decision paths and rule coverage. Together, they enable intuitive exploration of both cluster-level patterns and individual tree behavior. Experiments on the Glass dataset and user studies demonstrate that our approach significantly enhances the revelation of internal forest structure, improving both the efficiency and accuracy of users’ comprehension of the ensemble’s global behavior.

Technology Category

Application Category

📝 Abstract

Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree, they are also harder to interpret. This paper presents a visualization method and system to increase interpretability of random forests. We cluster similar trees which enables users to interpret how the model performs in general without needing to analyze each individual decision tree in detail, or interpret an oversimplified summary of the full forest. To meaningfully cluster the decision trees, we introduce a new distance metric that takes into account both the decision rules as well as the predictions of a pair of decision trees. We also propose two new visualization methods that visualize both clustered and individual decision trees: (1) The Feature Plot, which visualizes the topological position of features in the decision trees, and (2) the Rule Plot, which visualizes the decision rules of the decision trees. We demonstrate the efficacy of our approach through a case study on the "Glass" dataset, which is a relatively complex standard machine learning dataset, as well as a small user study.

Problem

Research questions and friction points this paper is trying to address.

Enhancing interpretability of random forest models

Clustering similar trees to simplify model analysis

Visualizing decision rules and feature importance effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clusters similar trees for interpretability

Introduces new distance metric for trees

Proposes Feature Plot and Rule Plot

🔎 Similar Papers

No similar papers found.

Bosch Group

Stuttgart, Germany

Machine Learning Scientist, Scientific Reasoning Models, AI for Drug Discovery

Genentech

New York City, New York, United States of America / South San Francisco, California, United States of America

Authors to Follow