Adaptive Bayesian computation for efficient biobank-scale genomic inference

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

At biobank scale, Bayesian hierarchical models are computationally prohibitive for genome-wide multi-trait association analysis due to expensive posterior inference. To address this bottleneck, we propose an adaptive focusing variational inference method: within the coordinate-ascent variational inference (CAVI) framework and a hierarchical regression joint model, our approach dynamically identifies and prioritizes updates for parameter subsets exhibiting strong genetic effects—leveraging the biologically motivated “sparse effects” prior. Specifically designed for multi-trait protein quantitative trait locus (pQTL) mapping, our method achieves up to 50% reduction in runtime on both simulated data and UK Biobank cohorts, without sacrificing statistical power. Moreover, it scales effectively to genome-wide analyses involving thousands of traits, substantially enhancing the scalability and practical utility of large-scale biobank studies.

Technology Category

Application Category

📝 Abstract

Motivation: Modern biobanks, with unprecedented sample sizes and phenotypic diversity, have become foundational resources for genomic studies, enabling powerful cross-phenotype and population-scale analyses. As studies grow in complexity, Bayesian hierarchical models offer a principled framework for jointly modeling multiple units such as cells, traits, and experimental conditions, increasing statistical power through information sharing. However, adoption of Bayesian hierarchical models in biobank-scale studies remains limited due to computational inefficiencies, particularly in posterior inference over high-dimensional parameter spaces. Deterministic approximations such as variational inference provide scalable alternatives to Markov Chain Monte Carlo, yet current implementations do not fully exploit the structure of genome-wide multi-unit modeling, especially when biological effects of interest are concentrated in a few units. Results: We propose an adaptive focus (AF) strategy within a block coordinate ascent variational inference (CAVI) framework that selectively updates subsets of parameters at each iteration, corresponding to units deemed relevant based on current estimates. We illustrate this approach in protein quantitative trait locus (pQTL) mapping using a joint model of hierarchically linked regressions with shared parameters across traits. In both simulated data and real proteomic data from the UK Biobank, AF-CAVI achieves up to a 50% reduction in runtime while maintaining statistical performance. We also provide a genome-wide pipeline for multi-trait pQTL mapping across thousands of traits, demonstrating AF-CAVI as an efficient scheme for large-scale, multi-unit Bayesian analysis in biobanks.

Problem

Research questions and friction points this paper is trying to address.

Addresses computational inefficiency in Bayesian biobank genomic studies

Enables scalable hierarchical modeling for multi-trait genetic analyses

Reduces runtime while maintaining statistical power in large-scale inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive focus strategy in variational inference

Selective parameter updates for efficiency

Genome-wide multi-trait mapping pipeline

🔎 Similar Papers

PP-GWAS: Privacy Preserving Multi-Site Genome-wide Association Studies