The Generalized Proximity Forest

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited applicability of Random Forest (RF)-based proximity measures. We propose Generalized Proximity Forest (GPF), the first framework to systematically extend the proximity paradigm to general supervised learning—encompassing classification, regression, and time-series tasks. GPF constructs instance-level similarity graphs via customizable distance metrics, decoupling proximity computation from base-model architecture and eliminating dependence on RF-specific structures. We further introduce a regression-compatible proximity definition and embed it within a meta-learning framework, enabling plug-and-play missing-label imputation for arbitrary pre-trained classifiers. Experiments across diverse benchmark tasks demonstrate that GPF consistently outperforms conventional RF and k-NN baselines. It offers superior flexibility—supporting heterogeneous tasks and distance functions—enhanced robustness to noise and distribution shift, and improved interpretability through explicit, graph-based similarity modeling.

Technology Category

Application Category

📝 Abstract
Recent work has demonstrated the utility of Random Forest (RF) proximities for various supervised machine learning tasks, including outlier detection, missing data imputation, and visualization. However, the utility of the RF proximities depends upon the success of the RF model, which itself is not the ideal model in all contexts. RF proximities have recently been extended to time series by means of the distance-based Proximity Forest (PF) model, among others, affording time series analysis with the benefits of RF proximities. In this work, we introduce the generalized PF model, thereby extending RF proximities to all contexts in which supervised distance-based machine learning can occur. Additionally, we introduce a variant of the PF model for regression tasks. We also introduce the notion of using the generalized PF model as a meta-learning framework, extending supervised imputation capability to any pre-trained classifier. We experimentally demonstrate the unique advantages of the generalized PF model compared with both the RF model and the $k$-nearest neighbors model.
Problem

Research questions and friction points this paper is trying to address.

Extends Random Forest proximities to all supervised distance-based learning contexts
Introduces a variant of Proximity Forest model for regression tasks
Provides meta-learning framework for supervised imputation using any classifier
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Proximity Forest extends RF proximities universally
Introduces regression variant for supervised distance-based tasks
Meta-learning framework enables imputation with pre-trained classifiers
🔎 Similar Papers
No similar papers found.
B
Ben Shaw
Dept. of Mathematics & Statistics, Utah State University
A
Adam Rustad
Dept. of Computer Science, Brigham Young University
S
Sofia Pelagalli Maia
Dept. of Statistics, Brigham Young University
Jake S. Rhodes
Jake S. Rhodes
Brigham Young University
Machine LearningData Science
Kevin R. Moon
Kevin R. Moon
Utah State University
Machine LearningInformation TheoryComputational BiologySignal Processing