🤖 AI Summary
This study addresses the challenges of high computational cost arising from large-scale samples and data privacy constraints across institutions in functional regression. It proposes the first distributed estimation framework applicable to functional linear models (FLM), nonparametric models (FNPM), and partially linear models (FPLM). By integrating distributed statistical inference with functional data analysis techniques, the method enables efficient parallel estimation without sharing raw data. Theoretical analysis and simulation studies demonstrate that the proposed approach substantially reduces computation time while maintaining high estimation and prediction accuracy. Empirical validation on the Tecator dataset further confirms its effectiveness, achieving a favorable balance among privacy preservation, computational efficiency, and statistical accuracy.
📝 Abstract
This paper proposes distributed estimation procedures for three scalar-on-function regression models: the functional linear model (FLM), the functional non-parametric model (FNPM), and the functional partial linear model (FPLM). The framework addresses two key challenges in functional data analysis, namely the high computational cost of large samples and limitations on sharing raw data across institutions. Monte Carlo simulations show that the distributed estimators substantially reduce computation time while preserving high estimation and prediction accuracy for all three models. When block sizes become too small, the FPLM exhibits overfitting, leading to narrower prediction intervals and reduced empirical coverage probability. An example of an empirical study using the \textit{tecator} dataset further supports these findings.