🤖 AI Summary
Addressing the challenges of modeling nonlinear genotype–phenotype relationships, poor interpretability, and low computational efficiency in complex genetic risk prediction, this paper introduces the neural tangent kernel (NTK) to genetic analysis for the first time, proposing two novel models: NTK-LMM and NTK-KRR. Methodologically, the approach integrates the expressive power of NTK with the statistical interpretability of linear mixed models (LMM), kernel ridge regression (KRR), and MINQUE estimation, augmented by cross-validated regularization. Evaluated on simulated data and the ADNI cohort, the models significantly improve prediction accuracy for hippocampal volume and entorhinal cortex thickness compared to conventional linear models and black-box neural networks. Moreover, they enable efficient training, heritability decomposition, and variant-level effect localization—thereby jointly achieving high representational capacity, statistical transparency, and computational feasibility.
📝 Abstract
Given the complexity of genetic risk prediction, there is a critical need for the development of novel methodologies that can effectively capture intricate genotype--phenotype relationships (e.g., nonlinear) while remaining statistically interpretable and computationally tractable. We develop a Neural Tangent Kernel (NTK) framework to integrate kernel methods into deep neural networks for genetic risk prediction analysis. We consider two approaches: NTK-LMM, which embeds the empirical NTK in a linear mixed model with variance components estimated via minimum quadratic unbiased estimator (MINQUE), and NTK-KRR, which performs kernel ridge regression with cross-validated regularization. Through simulation studies, we show that NTK-based models outperform the traditional neural network models and linear mixed models. By applying NTK to endophenotypes (e.g., hippocampal volume) and AD-related genes (e.g., APOE) from Alzheimer's Disease Neuroimaging Initiative (ADNI), we found that NTK achieved higher accuracy than existing methods for hippocampal volume and entorhinal cortex thickness. In addition to its accuracy performance, NTK has favorable optimization properties (i.e., having a closed-form or convex training) and generates interpretable results due to its connection to variance components and heritability. Overall, our results indicate that by integrating the strengths of both deep neural networks and kernel methods, NTK offers competitive performance for genetic risk prediction analysis while having the advantages of interpretability and computational efficiency.