🤖 AI Summary
This study addresses a critical challenge in Mendelian randomization: population stratification and assortative mating often induce spurious associations between genetic instruments and unobserved confounders, thereby violating the exclusion restriction assumption. To tackle this issue, the authors introduce, for the first time, the principle of cross-environment invariance into this domain and propose a representation learning framework that leverages multi-environment genetic data to identify and recover latent instrumental components satisfying the exogeneity condition. Theoretical analysis establishes identifiability guarantees under various mixing mechanisms. Empirical evaluations on both simulated data and semi-synthetic datasets derived from the All of Us Research Hub demonstrate that the proposed method significantly improves the accuracy of causal effect estimation.
📝 Abstract
Mendelian Randomization (MR) is a prominent observational epidemiological research method designed to address unobserved confounding when estimating causal effects. However, core assumptions -- particularly the independence between instruments and unobserved confounders -- are often violated due to population stratification or assortative mating. Leveraging the increasing availability of multi-environment data, we propose a representation learning framework that exploits cross-environment invariance to recover latent exogenous components of genetic instruments. We provide theoretical guarantees for identifying these latent instruments under various mixing mechanisms and demonstrate the effectiveness of our approach through simulations and semi-synthetic experiments using data from the All of Us Research Hub.