🤖 AI Summary
This paper addresses regression prediction set construction when covariates $X$ reside in Euclidean space and responses $Y$ lie on a Riemannian manifold. We systematically extend conformal inference to manifold-valued responses, proposing a distribution-free, nonparametric confidence prediction set method. Theoretically, we prove that the empirical prediction set converges almost surely to its population counterpart on the manifold, and rigorously guarantees the nominal coverage level for finite samples. Methodologically, the approach integrates Riemannian geometry, nonparametric statistics, and asymptotic analysis. Extensive simulations and real-data analyses—on covariance matrices and Grassmann manifolds—demonstrate robustness and computational efficiency. The key contribution is the first theoretically grounded, general-purpose prediction set framework for manifold-valued regression, breaking conformal inference’s reliance on Euclidean structure and advancing manifold-valued statistical inference.
📝 Abstract
Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.