Active Causal Experimentalist (ACE): Learning Intervention Strategies via Direct Preference Optimization

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently designing sequential interventional experiments in causal discovery to maximize information gain while incorporating prior knowledge. The authors propose a preference-based sequential experimental design framework that formulates intervention strategies as a sequential decision-making process. By learning relative preferences between pairs of interventions—rather than absolute rewards—the approach enables adaptive planning through Direct Preference Optimization (DPO). This method leverages preference data to train policies without relying on non-stationary reward signals and autonomously learns theoretically grounded intervention patterns, such as focusing on parent variables. Experiments on synthetic, physical simulation, and economic datasets demonstrate that, under identical intervention budgets, the proposed method achieves a 70–71% improvement over baseline approaches (p < 0.001, Cohen’s d ≈ 2).

Technology Category

Application Category

📝 Abstract
Discovering causal relationships requires controlled experiments, but experimentalists face a sequential decision problem: each intervention reveals information that should inform what to try next. Traditional approaches such as random sampling, greedy information maximization, and round-robin coverage treat each decision in isolation, unable to learn adaptive strategies from experience. We propose Active Causal Experimentalist (ACE), which learns experimental design as a sequential policy. Our key insight is that while absolute information gains diminish as knowledge accumulates (making value-based RL unstable), relative comparisons between candidate interventions remain meaningful throughout. ACE exploits this via Direct Preference Optimization, learning from pairwise intervention comparisons rather than non-stationary reward magnitudes. Across synthetic benchmarks, physics simulations, and economic data, ACE achieves 70-71% improvement over baselines at equal intervention budgets (p<0.001, Cohen's d ~ 2). Notably, the learned policy autonomously discovers that collider mechanisms require concentrated interventions on parent variables, a theoretically-grounded strategy that emerges purely from experience. This suggests preference-based learning can recover principled experimental strategies, complementing theory with learned domain adaptation.
Problem

Research questions and friction points this paper is trying to address.

causal discovery
intervention design
sequential decision making
active learning
experimental design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active Causal Experimentalist
Direct Preference Optimization
Causal Discovery
Sequential Intervention
Preference-based Learning
🔎 Similar Papers
No similar papers found.