๐ค AI Summary
Precision breeding urgently requires efficient computational tools to predict the functional effects of crop genetic variants. This study presents an end-to-end pipeline for predicting variant effects in rice, integrating the DeepChem-Variant deep learning model with crop-specific genomic annotations from RAP-DB and incorporating Grantham distance and BLOSUM62 matrix scores. The approach introduces a novel pathogenicity scoring method that operates without reliance on external databases and is extensible across crop species. Applied to the OsMT-3a gene, the pipeline predicted the functional impact of all 1,509 single-nucleotide variants within ten days, accurately classifying them into high-, medium-, and low-impact categoriesโan efficiency improvement of over two orders of magnitude compared to conventional wet-lab experiments.
๐ Abstract
Predicting functional consequences of genetic variants in crop genes remains a critical bottleneck for precision breeding programs. We present AgriVariant, an end-to-end pipeline for variant-effect prediction in rice (Oryza sativa) that addresses the lack of crop-specific variant-interpretation tools and can be extended to any crop species with available reference genomes and gene annotations. Our approach integrates deep learning-based variant calling (DeepChem-Variant) with custom plant genomics annotation using RAP-DB gene models and database-independent deleteriousness scoring that combines the Grantham distance and the BLOSUM62 substitution matrix. We validate the pipeline through targeted mutations in stress-response genes (OsDREB2a, OsDREB1F, SKC1), demonstrating correct classification of stop-gained, missense, and synonymous variants with appropriate HIGH / MODERATE / LOW impact assignments. An exhaustive mutagenesis study of OsMT-3a analyzed all 1,509 possible single-nucleotide variants in 10 days, identifying 353 high-impact, 447 medium-impact, and 709 low-impact variants - an analysis that would have required 2-4 years using traditional wet-lab approaches. This computational framework enables breeders to prioritize variants for experimental validation across diverse crop species, reducing screening costs and accelerating development of climate-resilient crop varieties.