CART Random Forests as Sequential Allocation over Random Opportunity Sets: A Stochastic-Control Theory of Ensemble Risk

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses the error structure and inter-tree interaction mechanisms of CART-based random forests under feature subsampling. To this end, it introduces the CART-ROSA framework, which, for the first time, models feature subsampling as a stochastic set of admissible actions and formalizes the entire process as a sequential resource allocation problem. By integrating stochastic control theory, CART splitting rules, and an information-split counting process, the framework disentangles two key design dimensions: the "information opportunity rate" and the "splitting policy contraction strength." Theoretical analysis reveals that while the CART policy is locally stable, it may be globally suboptimal. Under linear models, the work further derives an explicit mean squared error risk expansion, effectively bridging the gap between algorithmic description and theoretical analysis.

📝 Abstract

CART random forests are among the most widely used modern predictive methods, with well-documented empirical success. Yet, at the mechanistic level, the algorithm is often treated as a black box because of its complexity. In this paper, we develop a stochastic-control perspective on feature-subsampled CART random forests, named CART random opportunity-set allocation (CART-ROSA). At each node, the random subset of features is interpreted as a random feasible action set, and the CART split rule as a masked-action allocation policy. This policy induces a controlled stochastic process over informative split-count states, whose terminal law determines both single-tree error and cross-tree interaction terms in the forest mean squared error (MSE). Such representation opens the black box of CART-forests by separating two design levers: the informative-opportunity rate induced by feature subsampling, and the contraction strength from the within-mask split policy. We establish that the CART policy is locally stabilizing: it contracts imbalances in informative split allocations and concentrates terminal tree geometry. At the system level, however, it can be globally suboptimal for the forest objective. Specializing to the linear model, we derive the MSE risk expansion explicitly. Our results show how an operations-research perspective makes tractable a theoretical gap difficult to access from the standard algorithmic description of CART forests.

Problem

Research questions and friction points this paper is trying to address.

CART random forests

stochastic control

feature subsampling

mean squared error

black-box interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic control

CART random forests

feature subsampling