Causal-EPIG: A Prediction-Oriented Active Learning Framework for CATE Estimation

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Conventional active learning strategies for Conditional Average Treatment Effect (CATE) estimation suffer from objective mismatch: their acquisition functions optimize uncertainty in model parameters or observed outcomes—not the unobserved potential outcomes or CATE itself. Method: We propose the first causal-objective active learning framework, grounded in Expected Predictive Information Gain (EPIG). It directly models the joint distribution of potential outcomes to estimate CATE and explicitly targets reduction of uncertainty in causal quantities. The framework integrates robustness and sample efficiency via a dual-strategy adaptive sampling mechanism. Contribution/Results: Experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art active learning baselines. Furthermore, we identify that optimal sampling strategy selection depends systematically on both the base learner type and data complexity—yielding interpretable, context-aware guidance for practical deployment. This work bridges causal inference and active learning, establishing a principled foundation for uncertainty-aware CATE estimation.

Technology Category

Application Category

📝 Abstract

Estimating the Conditional Average Treatment Effect (CATE) is often constrained by the high cost of obtaining outcome measurements, making active learning essential. However, conventional active learning strategies suffer from a fundamental objective mismatch. They are designed to reduce uncertainty in model parameters or in observable factual outcomes, failing to directly target the unobservable causal quantities that are the true objects of interest. To address this misalignment, we introduce the principle of causal objective alignment, which posits that acquisition functions should target unobservable causal quantities, such as the potential outcomes and the CATE, rather than indirect proxies. We operationalize this principle through the Causal-EPIG framework, which adapts the information-theoretic criterion of Expected Predictive Information Gain (EPIG) to explicitly quantify the value of a query in terms of reducing uncertainty about unobservable causal quantities. From this unified framework, we derive two distinct strategies that embody a fundamental trade-off: a comprehensive approach that robustly models the full causal mechanisms via the joint potential outcomes, and a focused approach that directly targets the CATE estimand for maximum sample efficiency. Extensive experiments demonstrate that our strategies consistently outperform standard baselines, and crucially, reveal that the optimal strategy is context-dependent, contingent on the base estimator and data complexity. Our framework thus provides a principled guide for sample-efficient CATE estimation in practice.

Problem

Research questions and friction points this paper is trying to address.

Active learning reduces outcome measurement costs for CATE estimation

Conventional strategies misalign with unobservable causal quantities of interest

Causal-EPIG framework directly targets uncertainty in causal effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning framework targets causal quantities directly

Adapts Expected Predictive Information Gain for causal alignment

Derives joint outcome and focused CATE estimation strategies

🔎 Similar Papers

Measuring Variable Importance in Heterogeneous Treatment Effects with Confidence