🤖 AI Summary
To address the instability of wheat cultivar performance across variable environments—posing risks to food security and farming profitability—this study proposes a Gaussian process (GP)-based genotype-by-environment (G×E) prediction model. The method introduces a novel non-Euclidean kernel function explicitly designed to capture sequential and temporal structures, jointly encoding genetic similarity and environmental covariance—thereby relaxing restrictive linear assumptions inherent in conventional G×E models. Crucially, it enables high-accuracy extrapolative prediction of yield and grain protein content for both novel cultivars and unobserved environments, without requiring extensive historical phenotypic data. Empirical evaluation demonstrates superior predictive accuracy over state-of-the-art statistical and machine learning approaches, with robust performance even under severe data scarcity. This work provides a scalable, statistically principled framework to support intelligent breeding decisions and site-specific cultivation strategies.
📝 Abstract
Optimizing wheat variety selection for high performance in different environmental conditions is critical for reliable food production and stable incomes for growers. We employ a statistical machine learning framework utilizing Gaussian Process (GP) models to capture the effects of genetic and environmental factors on wheat yield and protein content. In doing so, selecting suitable covariance kernels to account for the distinct characteristics of the information is essential. The GP approach is closely related to linear mixed-effect models for genotype x environment predictions, where random additive and interaction effects are modeled with covariance structures. However, while commonly used linear mixed effect models in plant breeding rely on Euclidean-based kernels, we also test kernels specifically designed for strings and time series. The resulting GP models are capable of competitively predicting outcomes for (1) new environmental conditions, and (2) new varieties, even in scenarios with little to no previous data for the new conditions or variety. While we focus on a wheat test case using a novel dataset collected in Switzerland, the GP approach presented here can be applied and extended to a wide range of agricultural applications and beyond, paving the way for improved decision-making and data acquisition strategies.