StochTree: BART-based modeling in R and Python

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing BART-related packages suffer from limited R/Python interoperability, insufficient model coverage (e.g., heteroskedastic forests, random effects, tree-structured linear models), and inflexible MCMC customization—often requiring intrusive C++ source modifications. To address these limitations, we introduce *stochtree*, a cross-language C++ library featuring novel dual R/Python bindings that unify support for BART and Bayesian causal forests. Its key contributions are: (1) native support for heteroskedastic modeling, random effects structures, and leaf-wise linear regression; (2) seamless model serialization, cross-platform portability, and reinitialization without recompilation; and (3) low-level APIs enabling full user control over Gibbs and Metropolis–Hastings sampling procedures. Empirical evaluation demonstrates competitive accuracy and computational efficiency on causal inference and heteroskedastic regression tasks, substantially enhancing reproducibility and scalability of tree-based Bayesian models in both research and industrial settings.

Technology Category

Application Category

📝 Abstract
stochtree is a C++ library for Bayesian tree ensemble models such as BART and Bayesian Causal Forests (BCF), as well as user-specified variations. Unlike previous BART packages, stochtree provides bindings to both R and Python for full interoperability. stochtree boasts a more comprehensive range of models relative to previous packages, including heteroskedastic forests, random effects, and treed linear models. Additionally, stochtree offers flexible handling of model fits: the ability to save model fits, reinitialize models from existing fits (facilitating improved model initialization heuristics), and pass fits between R and Python. On both platforms, stochtree exposes lower-level functionality, allowing users to specify models incorporating Bayesian tree ensembles without needing to modify C++ code. We illustrate the use of stochtree in three settings: i) straightfoward applications of existing models such as BART and BCF, ii) models that include more sophisticated components like heteroskedasticity and leaf-wise regression models, and iii) as a component of custom MCMC routines to fit nonstandard tree ensemble models.
Problem

Research questions and friction points this paper is trying to address.

Provides Bayesian tree ensemble models in R and Python
Offers flexible model handling and interoperability between platforms
Enables custom model variations without modifying C++ code
Innovation

Methods, ideas, or system contributions that make the work stand out.

C++ library with R and Python bindings
Comprehensive models including heteroskedastic forests
Flexible model fit handling and initialization
🔎 Similar Papers
No similar papers found.