Interpreting artificial neural networks to detect genome-wide association signals for complex traits

📅 2024-07-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional genome-wide association studies (GWAS) are constrained by linear assumptions and struggle to detect non-linear genetic effects and epistatic interactions. Method: We propose an interpretable AI-driven GWAS paradigm that employs deep neural networks to model genotype–phenotype relationships, integrates gradient-weighted class activation mapping (Grad-CAM) with SHAP for locus-level feature importance attribution, and introduces a novel framework unifying multiple post-hoc interpretability methods with p-value calibration for statistically robust significance assessment. Results: Applied to the Estonian Biobank schizophrenia cohort, our method successfully replicates established risk loci, yields significant enrichment in brain morphology- and function-related pathways, and achieves superior detection accuracy compared to standard GWAS and state-of-the-art machine learning approaches—providing a new tool for complex trait genetics that balances interpretability with statistical rigor.

Technology Category

Application Category

📝 Abstract
Investigating the genetic architecture of complex diseases is challenging due to the multifactorial and interactive landscape of genomic and environmental influences. Although genome-wide association studies (GWAS) have identified thousands of variants for multiple complex traits, conventional statistical approaches can be limited by simplified assumptions such as linearity and lack of epistasis in models. In this work, we trained artificial neural networks to predict complex traits using both simulated and real genotype-phenotype datasets. We extracted feature importance scores via different post hoc interpretability methods to identify potentially associated loci (PAL) for the target phenotype and devised an approach for obtaining p-values for the detected PAL. Simulations with various parameters demonstrated that associated loci can be detected with good precision using strict selection criteria. By applying our approach to the schizophrenia cohort in the Estonian Biobank, we detected multiple loci associated with this highly polygenic and heritable disorder. There was significant concordance between PAL and loci previously associated with schizophrenia and bipolar disorder, with enrichment analyses of genes within the identified PAL predominantly highlighting terms related to brain morphology and function. With advancements in model optimization and uncertainty quantification, artificial neural networks have the potential to enhance the identification of genomic loci associated with complex diseases, offering a more comprehensive approach for GWAS and serving as initial screening tools for subsequent functional studies.
Problem

Research questions and friction points this paper is trying to address.

Detect genome-wide association signals
Interpret artificial neural networks
Identify loci for complex traits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Artificial neural networks predict complex traits
Feature importance scores identify associated loci
P-values obtained for detected loci
🔎 Similar Papers
No similar papers found.
B
Burak Yelmen
Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
M
Maris Alver
Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
E
Estonian Biobank Research Team
Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
F
Flora Jay
CNRS, INRIA, LISN, Paris-Saclay University, Orsay, France
Lili Milani
Lili Milani
Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia