🤖 AI Summary
Predicting genotype-to-phenotype mappings remains challenging due to the difficulty of modeling high-order epistasis and genotype-by-environment (G×E) interactions; conventional linear and pairwise epistatic models exhibit limited representational capacity. To address this, we introduce— for the first time in quantitative genetics—a Transformer-based framework for joint multi-environment modeling, enabling cross-environment few-shot transfer prediction. Our method explicitly captures high-order nonlinear genetic interactions and G×E effects via interpretable attention weights. We validate it on simulated data and a yeast QTL dataset. Results demonstrate that our approach significantly outperforms standard regression and state-of-the-art epistatic models under strong epistasis; moreover, it achieves high-accuracy phenotypic prediction with only a small number of target-environment samples. The framework exhibits both strong generalizability across environments and biologically meaningful interpretability through attention mechanisms.
📝 Abstract
Predicting phenotype from genotype is a central challenge in genetics. Traditional approaches in quantitative genetics typically analyze this problem using methods based on linear regression. These methods generally assume that the genetic architecture of complex traits can be parameterized in terms of an additive model, where the effects of loci are independent, plus (in some cases) pair-wise epistatic interactions between loci. However, these models struggle to analyze more complex patterns of epistasis or subtle gene-environment interactions. Recent advances in machine learning, particularly attention-based models, offer a promising alternative. Initially developed for natural language processing, attention-based models excel at capturing context-dependent interactions and have shown exceptional performance in predicting protein structure and function. Here, we apply attention-based models to quantitative genetics. We analyze the performance of this attention-based approach in predicting phenotype from genotype using simulated data across a range of models with increasing epistatic complexity, and using experimental data from a recent quantitative trait locus mapping study in budding yeast. We find that our model demonstrates superior out-of-sample predictions in epistatic regimes compared to standard methods. We also explore a more general multi-environment attention-based model to jointly analyze genotype-phenotype maps across multiple environments and show that such architectures can be used for “transfer learning” – predicting phenotypes in novel environments with limited training data.