A Robust Framework for Two-Sample Mendelian Randomization under Population Heterogeneity

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This study addresses bias in causal effect estimation within two-sample Mendelian randomization arising from population heterogeneity—such as differences in ancestral background, covariate adjustment, or measurement protocols—and proposes a model-free, robust inference framework. Moving beyond conventional homogeneity assumptions, the method directly accommodates heterogeneity in summary-level data while simultaneously accounting for practical challenges including measurement error, weak instrumental variables, and pleiotropy. Built upon a nonparametric, assumption-lean statistical strategy, the resulting estimator achieves consistency and asymptotic normality across heterogeneous populations and gains efficiency under homogeneity. Extensive simulations and real-world cross-ancestry analyses, such as the causal effect of BMI on HDL cholesterol, demonstrate the method’s stability, robustness, and practical utility.

Technology Category

Application Category

📝 Abstract

Mendelian randomization is a powerful tool for causal inference in observational studies. The two-sample summary-data design, which estimates genetic associations with exposures and outcomes in separate cohorts, is the most widely used Mendelian randomization approach in large-scale genomic studies. However, this approach relies on a strong assumption of population homogeneity across the two samples. In practice, available samples often differ in ancestry, demographics, socioeconomic factors, covariate adjustment, and measurement protocols. Violations of the homogeneity assumption can bias causal effect estimates and undermine the credibility of Mendelian randomization findings. We introduce a robust, model-free Mendelian randomization framework that directly addresses population heterogeneity in the two-sample summary-data setting. Our method avoids parametric assumptions about population differences and is designed to address real-world challenges, including measurement error, weak instruments, and pleiotropy. We show that the proposed estimator is consistent and asymptotically normal under heterogeneous designs, and may offer efficiency gains over the classic estimator even in homogeneous settings. Through numerical simulations and a real data analysis for estimating the causal effect of body mass index on high-density lipoprotein cholesterol across ancestrally diverse populations, we demonstrate the practical utility, stability, and robustness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Mendelian randomization

population heterogeneity

two-sample design

causal inference

summary-data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mendelian randomization

population heterogeneity

two-sample summary data