A powerful transformation of quantitative responses for biobank-scale association studies

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In large-scale biobank genetic association studies, non-Gaussian errors impair statistical power for detecting weak genetic signals and hinder computational efficiency. To address this, we propose a novel response variable transformation method that leverages error density information. Our approach constructs a locally most powerful test statistic and designs a consistent, computationally efficient first-order optimization estimator, jointly enhancing statistical power and scalability. Theoretical analysis guarantees strict control of Type I error. Numerical experiments and an application to lung functional traits in the UK Biobank demonstrate that our method significantly improves statistical power over existing transformation approaches while maintaining linear time complexity even at million-sample scales. The key innovation lies in integrating nonparametric density estimation into the transformation function design—achieving, for the first time, both local optimality and computational tractability in large-scale settings.

Technology Category

Application Category

📝 Abstract
In linear regression models with non-Gaussian errors, transformations of the response variable are widely used in a broad range of applications. Motivated by various genetic association studies, transformation methods for hypothesis testing have received substantial interest. In recent years, the rise of biobank-scale genetic studies, which feature a vast number of participants that could be around half a million, spurred the need for new transformation methods that are both powerful for detecting weak genetic signals and computationally efficient for large-scale data. In this work, we propose a novel transformation method that leverages the information of the error density. This transformation leads to locally most powerful tests and therefore has strong power for detecting weak signals. To make the computation scalable to biobank-scale studies, we harnessed the nature of weak genetic signals and proposed a consistent and computationally efficient estimator of the transformation function. Through extensive simulations and a gene-based analysis of spirometry traits from the UK Biobank, we validate that our approach maintains stringent control over type I error rates and significantly enhances statistical power over existing methods.
Problem

Research questions and friction points this paper is trying to address.

Develop efficient transformation for biobank-scale genetic studies
Enhance power to detect weak genetic signals
Ensure computational scalability for large datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages error density for powerful transformations
Computationally efficient estimator for large-scale data
Enhances power while controlling type I error
🔎 Similar Papers
No similar papers found.
Y
Yaowu Liu
Joint Lab of Data Science and Business Intelligence, Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, Sichuan, 611130, China
Tianying Wang
Tianying Wang
Colorado State University
high-dimensional data analysismeasurement errorquantile regressiongenetics analysisstatistical genetics