๐ค AI Summary
Existing deep generative models lack Neyman orthogonality in estimating conditional potential outcome distributions, thus failing to guarantee quasi-oracle efficiency and double robustness. To address this, we propose the GDR-learners frameworkโthe first systematic integration of Neyman orthogonality into generative modeling. GDR-learners supports diverse architectures, including conditional normalizing flows (GDR-CNFs), conditional GANs (GDR-CGANs), conditional VAEs (GDR-CVAEs), and conditional diffusion models (GDR-CDMs). It enables doubly robust estimation with respect to both treatment mechanism and outcome model, achieving asymptotically optimal convergence rates. We establish theoretical guarantees of quasi-oracle efficiency under standard regularity conditions. Empirically, GDR-learners significantly outperforms state-of-the-art methods across multiple semi-synthetic benchmarks, delivering accurate, high-fidelity characterization of conditional potential outcome distributions.
๐ Abstract
Various deep generative models have been proposed to estimate potential outcomes distributions from observational data. However, none of them have the favorable theoretical property of general Neyman-orthogonality and, associated with it, quasi-oracle efficiency and double robustness. In this paper, we introduce a general suite of generative Neyman-orthogonal (doubly-robust) learners that estimate the conditional distributions of potential outcomes. Our proposed GDR-learners are flexible and can be instantiated with many state-of-the-art deep generative models. In particular, we develop GDR-learners based on (a) conditional normalizing flows (which we call GDR-CNFs), (b) conditional generative adversarial networks (GDR-CGANs), (c) conditional variational autoencoders (GDR-CVAEs), and (d) conditional diffusion models (GDR-CDMs). Unlike the existing methods, our GDR-learners possess the properties of quasi-oracle efficiency and rate double robustness, and are thus asymptotically optimal. In a series of (semi-)synthetic experiments, we demonstrate that our GDR-learners are very effective and outperform the existing methods in estimating the conditional distributions of potential outcomes.