Frequentist Consistency of Prior-Data Fitted Networks for Causal Inference

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the lack of frequentist consistency in existing Prior-Data Fitted Networks (PFNs) for causal inference, where Bayesian estimates suffer from prior-induced bias and fail to converge to the true average treatment effect (ATE) as sample size grows. To resolve this, we propose the One-Step Posterior Correction (OSPC) framework, which integrates a martingale posterior approach to recalibrate PFNs. This yields the first theoretical guarantee of frequentist consistency for PFNs in causal inference and establishes a semiparametric Bernstein–von Mises theorem. Empirical results demonstrate that the calibrated PFN achieves the same asymptotic distribution as classical semiparametric efficient estimators in large samples and exhibits superior uncertainty calibration compared to alternative Bayesian methods in finite samples.

Technology Category

Application Category

📝 Abstract

Foundation models based on prior-data fitted networks (PFNs) have shown strong empirical performance in causal inference by framing the task as an in-context learning problem.However, it is unclear whether PFN-based causal estimators provide uncertainty quantification that is consistent with classical frequentist estimators. In this work, we address this gap by analyzing the frequentist consistency of PFN-based estimators for the average treatment effect (ATE). (1) We show that existing PFNs, when interpreted as Bayesian ATE estimators, can exhibit prior-induced confounding bias: the prior is not asymptotically overwritten by data, which, in turn, prevents frequentist consistency. (2) As a remedy, we suggest employing a calibration procedure based on a one-step posterior correction (OSPC). We show that the OSPC helps to restore frequentist consistency and can yield a semi-parametric Bernstein-von Mises theorem for calibrated PFNs (i.e., both the calibrated PFN-based estimators and the classical semi-parametric efficient estimators converge in distribution with growing data size). (3) Finally, we implement OSPC through tailoring martingale posteriors on top of the PFNs. In this way, we are able to recover functional nuisance posteriors from PFNs, required by the OSPC. In multiple (semi-)synthetic experiments, PFNs calibrated with our martingale posterior OSPC produce ATE uncertainty that (i) asymptotically matches frequentist uncertainty and (ii) is well calibrated in finite samples in comparison to other Bayesian ATE estimators.

Problem

Research questions and friction points this paper is trying to address.

frequentist consistency

prior-data fitted networks

causal inference

average treatment effect

uncertainty quantification

Innovation

Methods, ideas, or system contributions that make the work stand out.

prior-data fitted networks

frequentist consistency

one-step posterior correction