TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes

๐Ÿ“… 2026-04-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

190K/year
๐Ÿค– AI Summary
This study addresses the computational inefficiency of traditional genome-wide association study (GWAS) methods when analyzing large-scale quantitative phenotypes, which hinders high-throughput genetic analyses. To overcome this limitation, the authors propose the first PyTorch-based, GPU-accelerated linear GWAS framework that leverages NVIDIA GPUs for efficient parallel computation. The framework supports input in NumPy, PLINK, and BGEN formats and incorporates built-in functionalities for sample alignment and covariate adjustment. Demonstrating exceptional scalability, the method completes a genome-wide association analysis of 20,480 phenotypes in just 20 minutes on a single A100 GPUโ€”achieving a 300- to 1,700-fold speedup over state-of-the-art CPU-based approaches. This dramatic improvement in phenotypic throughput renders the framework particularly suitable for high-throughput screening scenarios involving rich phenotypic data.

Technology Category

Application Category

๐Ÿ“ Abstract
Motivation: Modern bioinformatics workflows, particularly in imaging and representation learning, can generate thousands to tens of thousands of quantitative phenotypes from a single cohort. In such settings, running genome-wide association analyses trait by trait rapidly becomes a computational bottleneck. While established GWAS tools are highly effective for individual traits, they are not optimized for phenotype-rich screening workflows in which the same genotype matrix is reused across a large phenotype panel. Results: We present TorchGWAS, a framework for high-throughput association testing of large phenotype panels through hardware acceleration. The current public release provides stable Python and command-line workflows for linear GWAS and multivariate phenotype screening, supports NumPy, PLINK, and BGEN genotype inputs, aligns phenotype and covariate tables by sample identifier, and performs covariate adjustment internally. In a benchmark with 8.9 million markers and 23,000 samples, fastGWA required approximately 100 second per phenotype on an AMD EPYC 7763 64-core CPU, whereas TorchGWAS completed 2,048 phenotypes in 10 minute and 20,480 phenotypes in 20 minutes on a single NVIDIA A100 GPU, corresponding to an approximately 300- to 1700-fold increase in phenotype throughput. TorchGWAS therefore makes large-scale GWAS screening practical in phenotype-rich settings where thousands of quantitative traits must be evaluated efficiently. Availability and implementation: TorchGWAS is implemented in Python and distributed as a documented source repository at https://github.com/ZhiGroup/TorchGWAS. The current release provides a command-line interface, packaged source code, tutorials, benchmark scripts, and example workflows.
Problem

Research questions and friction points this paper is trying to address.

GWAS
quantitative phenotypes
computational bottleneck
high-throughput screening
genotype-phenotype association
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration
high-throughput GWAS
multivariate phenotypes
linear association testing
hardware-accelerated genomics
๐Ÿ”Ž Similar Papers
No similar papers found.
Xingzhong Zhao
Xingzhong Zhao
PhD, Fudan University
BioinformaticsImaging geneticsPrecise medicine
Z
Ziqian Xie
Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
I
Islam
Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
S
Sheikh Muhammad Saiful
Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
T
Tian Xia
Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
C
Chen
Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
C
Cheng
Department of Bioinformatics and Systems Medicine, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
Degui Zhi
Degui Zhi
Department Chair, Professor, University of Texas Health Science Center at Houston
EHRImaging geneticsPopulation Genetics Informatics