🤖 AI Summary
High-resolution, continuous monitoring in personalized medicine and digital health generates multivariate, functional, and graph-structured biomarker data—posing challenges for conventional regression frameworks that assume Euclidean responses.
Method: We propose the first optimal subset selection framework for metric-space-valued responses—including Euclidean, functional, and graph-valued outcomes—extending best-subset selection to multivariate functional and random graph response settings. Our approach unifies integer programming, functional data analysis, and metric-space statistical modeling, supporting linear, quantile, and nonparametric additive regression.
Contribution/Results: Compared to state-of-the-art methods, our framework achieves comparable predictive accuracy while improving computational efficiency by several orders of magnitude—especially for functional responses. It retains strong statistical interpretability through exact subset selection and demonstrates practical scalability to high-dimensional, complex-structured biomedical data.
📝 Abstract
Many problems within personalized medicine and digital health rely on the analysis of continuous-time functional biomarkers and other complex data structures emerging from high-resolution patient monitoring. In this context, this work proposes new optimization-based variable selection methods for multivariate, functional, and even more general outcomes in metrics spaces based on best-subset selection. Our framework applies to several types of regression models, including linear, quantile, or non parametric additive models, and to a broad range of random responses, such as univariate, multivariate Euclidean data, functional, and even random graphs. Our analysis demonstrates that our proposed methodology outperforms state-of-the-art methods in accuracy and, especially, in speed-achieving several orders of magnitude improvement over competitors across various type of statistical responses as the case of mathematical functions. While our framework is general and is not designed for a specific regression and scientific problem, the article is self-contained and focuses on biomedical applications. In the clinical areas, serves as a valuable resource for professionals in biostatistics, statistics, and artificial intelligence interested in variable selection problem in this new technological AI-era.