Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses selection bias arising when the probability of inclusion in observational data depends on the target variable—a distortion that systematically biases parameter estimation and uncertainty quantification. To tackle this challenge, the authors propose a novel simulation-based inference framework that reframes selection bias correction as a simulation problem. By explicitly embedding the selection mechanism within a generative simulator and leveraging neural posterior estimation, the method enables amortized Bayesian inference without requiring an explicit likelihood function. Notably, it accommodates selection mechanisms that depend on unobserved variables and supports formal testing for the presence of bias as well as posterior calibration assessment. Across three statistical applications with distinct selection mechanisms, the framework recovers well-calibrated posterior distributions and substantially outperforms conventional likelihood-based approaches, even under severe selection bias.

Technology Category

Application Category

📝 Abstract
Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.
Problem

Research questions and friction points this paper is trying to address.

selection bias
statistical estimation
uncertainty quantification
biased sampling
epidemiological studies
Innovation

Methods, ideas, or system contributions that make the work stand out.

selection bias
simulation-based inference
amortized Bayesian inference
neural posterior estimation
posterior calibration
🔎 Similar Papers
J
Jonas Arruda
Bonn Center for Mathematical Life Sciences, University of Bonn, Bonn, Germany
S
Sophie Chervet
Epidemiology and Modeling of Antibiotic Evasion Unit, Institut Pasteur, Paris, France
P
Paula Staudt
Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
A
Andreas Wieser
Institute of Infectious Diseases and Tropical Medicine, LMU University Hospital, Munich, Germany
Michael Hoelscher
Michael Hoelscher
Direktor Tropenmedizin, Klinikum der LMU
Infectious DiseasesGlobal HealthDiagnosticsClinical Trials
I
Isabelle Sermet-Gaudelus
Centre de Référence Maladies Rares, Mucoviscidose et Maladies Apparentées, Site Constitutif Pédiatrique, Hôpital Necker Enfants Malades, Paris, France
N
Nadine Binder
Institute of General Practice/Family Medicine, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
L
Lulla Opatowski
Epidemiology and Modeling of Antibiotic Evasion Unit, Institut Pasteur, Paris, France
Jan Hasenauer
Jan Hasenauer
Universität Bonn
Systems BiologyData AnalysisMathematical Modelling