🤖 AI Summary
Fréchet regression—designed for responses in metric spaces (e.g., distribution functions, SPD matrices, spherical data) with Euclidean predictors—relies heavily on nonparametric kernel smoothing, suffering from the curse of dimensionality.
Method: We propose the first random-forest-based Fréchet regression framework, introducing adaptive local weighting via tree ensembles and developing both local-constant and local-linear estimators.
Contribution/Results: We establish strong consistency, optimal convergence rates, and asymptotic normality under mild regularity conditions, leveraging novel technical tools including infinite-order U-processes and $M_{m_n}$-estimation theory. The method significantly outperforms existing approaches in simulations and real-data applications—including New York City taxi trajectories and human mortality surfaces—while recovering classical random forest asymptotics as a special case when responses lie in Euclidean space. Our work unifies theoretical rigor with broad practical applicability across diverse metric response domains.
📝 Abstract
Statistical analysis is increasingly confronted with complex data from metric spaces. Petersen and M""uller (2019) established a general paradigm of Fr'echet regression with complex metric space valued responses and Euclidean predictors. However, the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forest weighted local Fr'echet regression paradigm. The main mechanism of our approach relies on a locally adaptive kernel generated by random forests. Our first method uses these weights as the local average to solve the conditional Fr'echet mean, while the second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods. Based on the theory of infinite order U-processes and infinite order $M_{m_n}$-estimator, we establish the consistency, rate of convergence, and asymptotic normality for our local constant estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our methods with several commonly encountered types of responses such as distribution functions, symmetric positive-definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to New York taxi data and human mortality data.