🤖 AI Summary
Predicting complex phenotypes such as grapevine leaf trichome density from SNP data remains challenging in variable field environments and across years due to limited model robustness. This work proposes LiT-G2P, a novel framework that uniquely integrates linear models—capturing additive genetic effects—with a Transformer architecture to model nonlinear SNP–SNP interactions. Leveraging genome-wide SNP data, attention mechanisms, and genotype-stratified analysis, LiT-G2P achieves single-year and cross-year root mean square errors (RMSE) of 0.469 and 0.454, respectively, corresponding to prediction accuracies of 79.2% and 74.6%, outperforming existing baselines. Moreover, the model’s attention weights enable identification of biologically interpretable candidate functional SNP markers, enhancing both predictive performance and genomic interpretability in perennial crop breeding.
📝 Abstract
Robust genotype-to-phenotype (G2P) prediction is essential for accelerating breeding decisions and genetic gain. However, it remains challenging to measure complex traits under variable field conditions and across years. In this study, we propose a linear-Transformer approach, LiT-G2P (Linear-Transformer Genotype-to-Phenotype), an automated predictive framework that integrates additive genetic variance effects with Transformer-based nonlinear interactions using genome-wide single-nucleotide polymorphisms (SNPs) data. We evaluated LiT-G2P on a panel of diverse grape accessions, genotyped with SNP markers and measured for phenotypes across two consecutive years. Target phenotypic traits include leaf hair density and trichome density of grapevines. Across both single-year and cross-year testing scenarios, LiT-G2P consistently improves prediction performance compared with baseline models. For hair density, LiT-G2P achieves the lowest error in both single-year and cross-year evaluations, with RMSEs of 0.469 and 0.454, respectively, while maintaining strong tolerance accuracies of 79.2% and 74.6%, respectively. For trichome density, LiT-G2P also presents the best overall G2P performance. In addition, we extract model-prioritized SNPs from attention weights and apply genotype-stratified analysis to provide interpretable candidate marker for downstream validation. These results demonstrate that integrating stable additive effects with learned interaction patterns can enhance cross-year robustness and support practical SNP-based predictive modeling for genomic selection.