🤖 AI Summary
This work proposes a causal-guided sequential decision-making framework for automated feature engineering, addressing the limitations of existing methods that rely on statistical heuristics and exhibit poor generalization under distributional shifts. By formulating feature construction as a multi-agent reinforcement learning problem, the approach introduces soft causal priors for the first time, integrating sparse directed acyclic graphs for causal discovery. It further employs a causality-aware grouping exploration strategy and a hierarchical reward shaping mechanism to enable efficient, stable, and interpretable feature generation. Evaluated on 15 benchmark datasets, the method achieves up to a 7% performance gain with faster convergence, reduces performance degradation under covariate shift by approximately fourfold, and yields more compact feature sets with consistently stable attributions.
📝 Abstract
Automated feature engineering (AFE) enables AI systems to autonomously construct high-utility representations from raw tabular data. However, existing AFE methods rely on statistical heuristics, yielding brittle features that fail under distribution shift. We introduce CAFE, a framework that reformulates AFE as a causally-guided sequential decision process, bridging causal discovery with reinforcement learning-driven feature construction. Phase I learns a sparse directed acyclic graph over features and the target to obtain soft causal priors, grouping features as direct, indirect, or other based on their causal influence with respect to the target. Phase II uses a cascading multi-agent deep Q-learning architecture to select causal groups and transformation operators, with hierarchical reward shaping and causal group-level exploration strategies that favor causally plausible transformations while controlling feature complexity. Across 15 public benchmarks (classification with macro-F1; regression with inverse relative absolute error), CAFE achieves up to 7% improvement over strong AFE baselines, reduces episodes-to-convergence, and delivers competitive time-to-target. Under controlled covariate shifts, CAFE reduces performance drop by ~4x relative to a non-causal multi-agent baseline, and produces more compact feature sets with more stable post-hoc attributions. These findings underscore that causal structure, used as a soft inductive prior rather than a rigid constraint, can substantially improve the robustness and efficiency of automated feature engineering.