🤖 AI Summary
This study addresses the challenge of statistical inference in regression settings where the response variable takes values in a general metric space—potentially non-Euclidean—while predictors reside in a Euclidean space. Existing methods lack effective inferential tools for such scenarios. The authors propose the first unified Fréchet regression inference framework for arbitrary metric-space-valued responses, introducing significance tests for both global and partial effects. To circumvent the absence of linear structure in metric spaces, they employ a random multiplier bootstrap to generate non-degenerate null distributions and develop a novel test statistic via the Cauchy combination method. The approach is validated through simulations on graph Laplacian networks and spherical geodesic distance data, and successfully applied to real-world analyses of New York City taxi traffic networks and U.S. energy composition patterns.
📝 Abstract
Linear regression is widely used to model relationships between responses and predictors. In modern applications, one encounters data where the responses are non-Euclidean random objects situated in a metric space, paired with Euclidean predictors. Global Fréchet regression generalizes linear regression to such general settings, however statistical inference has remained largely unexplored. We develop a significance test for the null hypothesis that the Fréchet regression function does not depend on the predictors, addressing the challenge of an absence of linear operations in metric spaces. We also develop a test for the partial effect of a subset of the predictors in analogy to, but quite different from, the partial F-tests commonly used in classical linear regression under Gaussian assumptions. Key ideas are to employ random multipliers to obtain non-degenerate null distributions for the proposed test statistics and the Cauchy combination method. We obtain consistency and convergence results under the null hypothesis and contiguous alternatives and demonstrate the finite sample performance of the proposed tests through simulations on network data represented by graph Laplacians and spherical data with geodesic distances. We further illustrate our method using transport networks arising from New York City taxi trip data and U.S. energy source compositional data.