Efficient Log-Rank Updates for Random Survival Forests

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

For right-censored survival data, the log-rank splitting criterion in random survival forests incurs high computational cost—O(M) recomputation per split—hindering scalability to large cohorts. To address this, we propose a constant-time incremental update method for the log-rank statistic. Building upon LeBlanc and Crowley’s (1995) approximation, our approach eliminates redundant full-scale computations at candidate splits while preserving the original predictive performance. We implement this optimization within the generalized random forests (grf) framework and empirically validate its scalability on large-scale survival datasets: training speed improves by several-fold to over an order of magnitude, depending on cohort size, without compromising survival prediction accuracy or statistical consistency. This work establishes an efficient and robust foundation for tree-based ensemble modeling in high-dimensional, large-scale survival analysis.

Technology Category

Application Category

📝 Abstract

Random survival forests are widely used for estimating covariate-conditional survival functions under right-censoring. Their standard log-rank splitting criterion is typically recomputed at each candidate split. This O(M) cost per split, with M the number of distinct event times in a node, creates a bottleneck for large cohort datasets with long follow-up. We revisit approximations proposed by LeBlanc and Crowley (1995) and develop simple constant-time updates for the log-rank criterion. The method is implemented in grf and substantially reduces training time on large datasets while preserving predictive performance.

Problem

Research questions and friction points this paper is trying to address.

Develops constant-time updates for log-rank splitting criterion

Reduces computational bottleneck in survival forest training

Maintains predictive accuracy while accelerating large dataset processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constant-time updates for log-rank criterion

Revisits LeBlanc and Crowley approximations

Reduces training time while preserving performance

🔎 Similar Papers

FPBoost: Fully Parametric Gradient Boosting for Survival Analysis