π€ AI Summary
This work addresses the fundamental tension between in-sample fit and out-of-sample generalization: while ordinary least squares (OLS) minimizes training error, it lacks robustness to distributional shifts; conversely, causal models offer strong out-of-distribution guarantees but sacrifice in-sample accuracy. To bridge this gap, we propose **causal regularization**, the first framework to formally characterize a continuous trade-off between causal strength and empirical risk. Grounded in structural causal models, our approach unifies regularization, subsample stability analysis, and finite-sample generalization theory, yielding a tight risk upper bound. We further prove that cross-validation adaptively achieves this bound. Both theoretical analysis and empirical evaluation demonstrate that our method preserves high in-sample fidelity while substantially improving robustness and reliability under distributional shift.
π Abstract
In recent decades, a number of ways of dealing with causality in practice, such as propensity score matching, the PC algorithm and invariant causal prediction, have been introduced. Besides its interpretational appeal, the causal model provides the best out-of-sample prediction guarantees. In this paper, we study the identification of causal-like models from in-sample data that provide out-of-sample risk guarantees when predicting a target variable from a set of covariates. Whereas ordinary least squares provides the best in-sample risk with limited out-of-sample guarantees, causal models have the best out-of-sample guarantees but achieve an inferior in-sample risk. By defining a trade-off of these properties, we introduce $ extit{causal regularization}$. As the regularization is increased, it provides estimators whose risk is more stable across sub-samples at the cost of increasing their overall in-sample risk. The increased risk stability is shown to lead to out-of-sample risk guarantees. We provide finite sample risk bounds for all models and prove the adequacy of cross-validation for attaining these bounds.