TorchGWAS : GPU-accelerated GWAS for thousands of quantitative phenotypes

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This study addresses the computational inefficiency of traditional genome-wide association study (GWAS) methods when analyzing large-scale quantitative phenotypes, which hinders high-throughput genetic analyses. To overcome this limitation, the authors propose the first PyTorch-based, GPU-accelerated linear GWAS framework that leverages NVIDIA GPUs for efficient parallel computation. The framework supports input in NumPy, PLINK, and BGEN formats and incorporates built-in functionalities for sample alignment and covariate adjustment. Demonstrating exceptional scalability, the method completes a genome-wide association analysis of 20,480 phenotypes in just 20 minutes on a single A100 GPU—achieving a 300- to 1,700-fold speedup over state-of-the-art CPU-based approaches. This dramatic improvement in phenotypic throughput renders the framework particularly suitable for high-throughput screening scenarios involving rich phenotypic data.

Technology Category

Application Category

📝 Abstract

Motivation: Modern bioinformatics workflows, particularly in imaging and representation learning, can generate thousands to tens of thousands of quantitative phenotypes from a single cohort. In such settings, running genome-wide association analyses trait by trait rapidly becomes a computational bottleneck. While established GWAS tools are highly effective for individual traits, they are not optimized for phenotype-rich screening workflows in which the same genotype matrix is reused across a large phenotype panel. Results: We present TorchGWAS, a framework for high-throughput association testing of large phenotype panels through hardware acceleration. The current public release provides stable Python and command-line workflows for linear GWAS and multivariate phenotype screening, supports NumPy, PLINK, and BGEN genotype inputs, aligns phenotype and covariate tables by sample identifier, and performs covariate adjustment internally. In a benchmark with 8.9 million markers and 23,000 samples, fastGWA required approximately 100 second per phenotype on an AMD EPYC 7763 64-core CPU, whereas TorchGWAS completed 2,048 phenotypes in 10 minute and 20,480 phenotypes in 20 minutes on a single NVIDIA A100 GPU, corresponding to an approximately 300- to 1700-fold increase in phenotype throughput. TorchGWAS therefore makes large-scale GWAS screening practical in phenotype-rich settings where thousands of quantitative traits must be evaluated efficiently. Availability and implementation: TorchGWAS is implemented in Python and distributed as a documented source repository at https://github.com/ZhiGroup/TorchGWAS. The current release provides a command-line interface, packaged source code, tutorials, benchmark scripts, and example workflows.

Problem

Research questions and friction points this paper is trying to address.

GWAS

quantitative phenotypes

computational bottleneck

high-throughput screening

genotype-phenotype association

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU acceleration

high-throughput GWAS

multivariate phenotypes