Classification Trees with Valid Inference via the Exponential Mechanism

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Decision trees offer interpretability and nonlinear modeling capability, yet their highly adaptive structure fundamentally impedes valid statistical inference. To address this, we propose a probabilistic tree-growing method for classification trees based on a temperature-controlled exponential mechanism: it replaces greedy splitting with stochastic node-splitting probabilities, enabling the first asymptotic correction for structural adaptivity in classification trees and permitting direct construction of valid pivotal quantities from the sampling distribution. We establish theoretical guarantees for asymptotically efficient parameter inference. Empirical results demonstrate that our approach achieves inference performance significantly superior to mainstream methods—such as data splitting—while preserving predictive accuracy. The core contribution is a unification of interpretability, predictive performance, and rigorous statistical inference for nonlinear tree models.

Technology Category

Application Category

📝 Abstract
Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely unaddressed. This paper focuses on classification trees and makes two key contributions. First, we introduce a novel tree-fitting method that replaces the greedy splitting of the predictor space in standard tree algorithms with a probabilistic approach. Each split in our approach is selected according to sampling probabilities defined by an exponential mechanism, with a temperature parameter controlling its deviation from the deterministic choice given data. Second, while our approach can fit a tree that, with high probability, approximates the fit produced by standard tree algorithms at high temperatures, it is not merely predictive- unlike standard algorithms, it enables valid inference by taking into account the highly adaptive tree structure. Our method produces pivots directly from the sampling probabilities in the exponential mechanism. In theory, our pivots allow asymptotically valid inference on the parameters in the predictive fit, and in practice, our method delivers powerful inference without sacrificing predictive accuracy, in contrast to data splitting methods.
Problem

Research questions and friction points this paper is trying to address.

Develops probabilistic tree-fitting method using exponential mechanism sampling
Enables valid statistical inference for highly adaptive classification trees
Produces asymptotically valid pivots without sacrificing predictive accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic splitting via exponential mechanism
Valid inference from adaptive tree structure
Pivots derived from sampling probabilities
🔎 Similar Papers
No similar papers found.