🤖 AI Summary
This study addresses the complex, heterogeneous relationship between physical activity and disease risk across diverse subpopulations in large-scale mobile health data. To this end, we propose a generalized heterogeneous functional regression model that simultaneously estimates subgroup-specific functional effects and identifies latent subgroups within a unified framework. A novel pre-clustering strategy is introduced to enhance computational efficiency for massive datasets, alongside a statistical testing procedure to assess the significance of effect differences across identified subgroups. Applied to data from over 96,000 participants in the UK Biobank, our method successfully uncovers three distinct subgroups exhibiting significantly different associations between physical activity and dementia risk, outperforming existing approaches in both predictive accuracy and interpretability.
📝 Abstract
Physical activity is crucial for human health. With the increasing availability of large-scale mobile health data, strong associations have been found between physical activity and various diseases. However, accurately capturing this complex relationship is challenging, possibly because it varies across different subgroups of subjects, especially in large-scale datasets. To fill this gap, we propose a generalized heterogeneous functional method which simultaneously estimates functional effects and identifies subgroups within the generalized functional regression framework. The proposed method captures subgroup-specific functional relationships between physical activity and diseases, providing a more nuanced understanding of these associations. Additionally, we introduce a pre-clustering method that enhances computational efficiency for large-scale data through a finer partition of subjects compared to true subgroups. In the real data application, we examine the impact of physical activity on the risk of mental disorders and Parkinson's disease using the UK Biobank dataset, which includes over 79,000 participants. Our proposed method outperforms existing methods in future-day prediction accuracy, identifying four subgroups for mental disorder outcomes and three subgroups for Parkinson's disease diagnosis, with detailed scientific interpretations for each subgroup. We also demonstrate theoretical consistency of our methods. Supplementary materials are available online. Codes implementing the proposed method are available at: https://github.com/xiaojing777/GHFM.