π€ AI Summary
Mendelian randomization (MR) faces methodological challenges including invalid or weak instrumental variables (IVs), integration of multi-source data, and adaptation to high-dimensional omics. To address these, we develop a unified analytical framework centered on causal estimands, accommodating family- and population-based study designs, individual- and summary-level data, and single- and two-sample MR settings. We propose novel strategies to detect and correct for bias arising from invalid and weak IVs, and integrate state-of-the-art software tools to enable joint modeling of multiple instruments. Our approach advances robust causal inference with genetic instruments under complex data structures and extends MR methodology to high-dimensional omics. Empirical validation using UK Biobank and Alzheimerβs disease proteomics data demonstrates improved estimation accuracy and reliability. This work provides a theoretically rigorous yet practically implementable MR analysis framework for biomedical and public health research.
π Abstract
Mendelian randomization (MR) has become an essential tool for causal inference in biomedical and public health research. By using genetic variants as instrumental variables, MR helps address unmeasured confounding and reverse causation, offering a quasi-experimental framework to evaluate causal effects of modifiable exposures on health outcomes. Despite its promise, MR faces substantial methodological challenges, including invalid instruments, weak instrument bias, and design complexities across different data structures. In this tutorial review, we provide a comprehensive overview of MR methods for causal inference, emphasizing clarity of causal interpretation, study design comparisons, availability of software tools, and practical guidance for applied scientists. We organize the review around causal estimands, ensuring that analyses are anchored to well-defined causal questions. We discuss the problems of invalid and weak instruments, comparing available strategies for their detection and correction. We integrate discussions of population-based versus family-based MR designs, analyses based on individual-level versus summary-level data, and one-sample versus two-sample MR designs, highlighting their relative advantages and limitations. We also summarize recent methodological advances and software developments that extend MR to settings with many weak or invalid instruments and to modern high-dimensional omics data. Real-data applications, including UK Biobank and Alzheimer's disease proteomics studies, illustrate the use of these methods in practice. This review aims to serve as a tutorial-style reference for both methodologists and applied scientists.