Using Importance Sampling to Estimate $p$-values in All-Subset Meta-Analysis, with Applications to Single-Cell eQTL Mapping

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
This study addresses the limitations of the ASSET method in subset-based meta-analysis, which relies on normality assumptions to compute p-values and whose analytical approximations become inaccurate under extreme tail probabilities or non-normal conditions—such as small sample sizes or low-frequency variants—while conventional Monte Carlo simulations incur prohibitive computational costs. The work presents the first systematic evaluation of ASSET’s accuracy in estimating tail p-values and introduces an efficient importance sampling (IS) algorithm that accurately estimates extremely small p-values in both independent and overlapping study designs. The proposed method maintains high precision even under non-normality and demonstrates substantial gains in computational efficiency. Its practical utility is validated through applications to the OneK1K dataset and a Korean lung cell single-cell eQTL analysis.

Technology Category

Application Category

📝 Abstract
Pooling genome-wide association studies of multiple related traits can substantially increase power for detecting genetic variants with pleiotropic effects. ASSET, which exhaustively searches all subsets of studies for association signals, has been widely used to detect modest effects and improve interpretability. Under a normality assumption, ASSET computes p-values via an analytic approximation that accounts for multiple testing. However, this approximation has been evaluated only in limited scenarios and for p-values no smaller than $10^{-3}$. A systematic assessment in the extreme tail is therefore needed, yet naïve Monte Carlo methods would require prohibitively many simulations. We develop a computationally efficient importance-sampling (IS) algorithm that provides accurate ASSET p-value estimates for both independent and overlapping studies, achieving substantial efficiency gains over naïve Monte Carlo, particularly for very small p-values. Using IS, we show that ASSET's analytic approximation is highly accurate across nearly the entire p-value range when normality holds. In contrast, when normality is violated (due to small sample sizes, low-frequency variants, or non-normal traits), ASSET p-values can be inflated or deflated by orders of magnitude, whereas our IS approach remains accurate. We illustrate the method through applications to single-cell eQTL mapping using peripheral blood mononuclear cells from the OneK1K cohort and lung cells from a Korean population.
Problem

Research questions and friction points this paper is trying to address.

importance sampling
p-value estimation
meta-analysis
ASSET
normality violation
Innovation

Methods, ideas, or system contributions that make the work stand out.

importance sampling
all-subset meta-analysis
p-value estimation
single-cell eQTL
ASSET
Samuel Anyaso-Samuel
Samuel Anyaso-Samuel
National Cancer Institute
Survival analysisMicrobiome data analysisMetagenomicsStatistical process control
T
Thong Luong
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, U.S.A.
F
Fei Qin
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, U.S.A.
Jiyeon Choi
Jiyeon Choi
Korea institute of machinery and materials
Femtosecond laser direct writing
K
Kai Yu
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, U.S.A.
P
Paul S. Albert
Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, U.S.A.
Jianxin Shi
Jianxin Shi
Assistant Professor, Nankai Univeristy
Volumetric VideoMultimedia CommunicationsMobile edge computing