Debiased Counterfactual Generation via Flow Matching from Observations

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the limitations of existing counterfactual distribution estimation methods, which often ignore the intrinsic connections between observed and counterfactual distributions, leading to substantial bias and poor generation quality. To overcome these issues, the authors propose a deconfounded flow matching framework that explicitly models the tight relationships in support sets, tail behaviors, and confounding-invariant features between the two distributions. The key innovations include a semiparametrically efficient estimator based on influence function correction and the first application of minimum energy flows to high-dimensional counterfactual modeling, which simplifies the flow objective and enhances training stability. Experimental results demonstrate that the proposed method significantly outperforms current debiasing approaches and effectively mitigates the failure modes commonly observed in high-dimensional flow-based counterfactual generators.

📝 Abstract

Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close under weak confounding, and share any features of high-dimensional outcomes which are invariant to confounders. These properties motivate learning counterfactual distributions not from scratch, but via a deconfounding flow from the observational distribution. We formulate this problem via flow-matching and derive a semiparametrically efficient estimator based on a novel efficient influence function correction. We subsequently extend our estimator to target minimal-energy flows in high-dimensions, which we show can be especially simple targets between observational and counterfactual distributions. In experiments, deconfounding flows outperform existing debiased counterfactual distribution estimators, while also mitigating known failure modes of flow-based methods.

Problem

Research questions and friction points this paper is trying to address.

counterfactual estimation

debiased generation

observational data

distribution learning

treatment effect

Innovation

Methods, ideas, or system contributions that make the work stand out.

deconfounding flow

flow matching

counterfactual generation