🤖 AI Summary
In high-dimensional discrete-time risk models, data augmentation induces substantial computational burden and hinders simultaneous attainment of estimation accuracy and variable selection. To address this, we propose a batch-wise backward-fitting algorithm grounded in the distribution regression framework and, for the first time, extend it to discrete survival models with additive predictors. Our method integrates generalized linear models with an efficient data augmentation scheme, enabling joint parameter estimation and sparse variable selection. Extensive simulations and application to real-world African infant mortality data demonstrate that the proposed approach significantly improves computational scalability—reducing runtime substantially—while maintaining high estimation accuracy and robust feature selection performance. This work establishes an efficient, automated modeling paradigm for large-scale discrete-time event history analysis.
📝 Abstract
Discrete-time hazard models are widely used when event times are measured in intervals or are not precisely observed. While these models can be estimated using standard generalized linear model techniques, they rely on extensive data augmentation, making estimation computationally demanding in high-dimensional settings. In this paper, we demonstrate how the recently proposed Batchwise Backfitting algorithm, a general framework for scalable estimation and variable selection in distributional regression, can be effectively extended to discrete hazard models. Using both simulated data and a large-scale application on infant mortality in sub-Saharan Africa, we show that the algorithm delivers accurate estimates, automatically selects relevant predictors, and scales efficiently to large data sets. The findings underscore the algorithm's practical utility for analysing large-scale, complex survival data with high-dimensional covariates.