Forest Kernel Balancing Weights: Outcome-Guided Features for Causal Inference

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

In observational causal inference, covariate balancing via feature selection often struggles to identify nonlinearities and higher-order interactions. This paper proposes Forest Kernel Balancing—a novel method that integrates the implicit leaf-co-occurrence kernels induced by random forests and Bayesian Additive Regression Trees (BART) with outcome-guided feature learning. It automatically extracts nonlinear and high-order interaction features critical for potential outcome prediction, directly embedding them into the covariate balancing procedure. Crucially, balancing is thus made endogenous to the causal effect estimation objective, overcoming a key limitation of conventional kernel-based approaches that ignore outcome information. Extensive simulations and empirical analyses demonstrate that our method substantially improves the accuracy and stability of treatment effect estimation: both statistical bias and variance are reduced simultaneously, while computational efficiency surpasses standard kernel balancing methods.

Technology Category

Application Category

📝 Abstract

While balancing covariates between groups is central for observational causal inference, selecting which features to balance remains a challenging problem. Kernel balancing is a promising approach that first estimates a kernel that captures similarity across units and then balances a (possibly low-dimensional) summary of that kernel, indirectly learning important features to balance. In this paper, we propose forest kernel balancing, which leverages the underappreciated fact that tree-based machine learning models, namely random forests and Bayesian additive regression trees (BART), implicitly estimate a kernel based on the co-occurrence of observations in the same terminal leaf node. Thus, even though the resulting kernel is solely a function of baseline features, the selected nonlinearities and other interactions are important for predicting the outcome -- and therefore are important for addressing confounding. Through simulations and applied illustrations, we show that forest kernel balancing leads to meaningful computational and statistical improvement relative to standard kernel methods, which do not incorporate outcome information when learning features.

Problem

Research questions and friction points this paper is trying to address.

Selecting features for balancing in causal inference

Using tree models to estimate outcome-relevant kernels

Improving kernel balancing with outcome-guided features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Forest kernel balancing leverages tree-based models

Uses co-occurrence in leaf nodes as implicit kernel

Improves computational and statistical performance over standard kernels

🔎 Similar Papers

No similar papers found.