🤖 AI Summary
BART struggles with modeling smooth functions due to its inherently discontinuous piecewise-constant outputs, limiting its applicability in spatial regression and related smooth estimation tasks. To address this, we propose ridgeBART—a Bayesian additive regression tree framework incorporating ridge function structure—marking the first integration of single-hidden-layer local neural networks into the BART paradigm for efficient nonparametric estimation of piecewise anisotropic Hölder functions. Our contributions include: (1) learnable directional ridge basis functions that enhance smoothness; (2) a linear-time MCMC sampler enabling scalable training on datasets with up to millions of observations; and (3) theoretical guarantees on optimal posterior contraction rates. Experiments on synthetic benchmarks and real-world NBA shot chart modeling demonstrate substantial improvements in spatial smoothness and predictive accuracy, while preserving scalability and statistical rigor.
📝 Abstract
Although it is an extremely effective, easy-to-use, and increasingly popular tool for nonparametric regression, the Bayesian Additive Regression Trees (BART) model is limited by the fact that it can only produce discontinuous output. Initial attempts to overcome this limitation were based on regression trees that output Gaussian Processes instead of constants. Unfortunately, implementations of these extensions cannot scale to large datasets. We propose ridgeBART, an extension of BART built with trees that output linear combinations of ridge functions (i.e., a composition of an affine transformation of the inputs and non-linearity); that is, we build a Bayesian ensemble of localized neural networks with a single hidden layer. We develop a new MCMC sampler that updates trees in linear time and establish posterior contraction rates for estimating piecewise anisotropic H""{o}lder functions and nearly minimax-optimal rates for estimating isotropic H""{o}lder functions. We demonstrate ridgeBART's effectiveness on synthetic data and use it to estimate the probability that a professional basketball player makes a shot from any location on the court in a spatially smooth fashion.