CART Random Forests as Sequential Allocation over Random Opportunity Sets: A Stochastic-Control Theory of Ensemble Risk

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the error structure and inter-tree interaction mechanisms of CART-based random forests under feature subsampling. To this end, it introduces the CART-ROSA framework, which, for the first time, models feature subsampling as a stochastic set of admissible actions and formalizes the entire process as a sequential resource allocation problem. By integrating stochastic control theory, CART splitting rules, and an information-split counting process, the framework disentangles two key design dimensions: the "information opportunity rate" and the "splitting policy contraction strength." Theoretical analysis reveals that while the CART policy is locally stable, it may be globally suboptimal. Under linear models, the work further derives an explicit mean squared error risk expansion, effectively bridging the gap between algorithmic description and theoretical analysis.
📝 Abstract
CART random forests are among the most widely used modern predictive methods, with well-documented empirical success. Yet, at the mechanistic level, the algorithm is often treated as a black box because of its complexity. In this paper, we develop a stochastic-control perspective on feature-subsampled CART random forests, named CART random opportunity-set allocation (CART-ROSA). At each node, the random subset of features is interpreted as a random feasible action set, and the CART split rule as a masked-action allocation policy. This policy induces a controlled stochastic process over informative split-count states, whose terminal law determines both single-tree error and cross-tree interaction terms in the forest mean squared error (MSE). Such representation opens the black box of CART-forests by separating two design levers: the informative-opportunity rate induced by feature subsampling, and the contraction strength from the within-mask split policy. We establish that the CART policy is locally stabilizing: it contracts imbalances in informative split allocations and concentrates terminal tree geometry. At the system level, however, it can be globally suboptimal for the forest objective. Specializing to the linear model, we derive the MSE risk expansion explicitly. Our results show how an operations-research perspective makes tractable a theoretical gap difficult to access from the standard algorithmic description of CART forests.
Problem

Research questions and friction points this paper is trying to address.

CART random forests
stochastic control
feature subsampling
mean squared error
black-box interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic control
CART random forests
feature subsampling
sequential allocation
ensemble risk
🔎 Similar Papers