Simultaneous Estimation and Model Choice for Big Discrete Time-to-Event Data with Additive Predictors

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

In high-dimensional discrete-time risk models, data augmentation induces substantial computational burden and hinders simultaneous attainment of estimation accuracy and variable selection. To address this, we propose a batch-wise backward-fitting algorithm grounded in the distribution regression framework and, for the first time, extend it to discrete survival models with additive predictors. Our method integrates generalized linear models with an efficient data augmentation scheme, enabling joint parameter estimation and sparse variable selection. Extensive simulations and application to real-world African infant mortality data demonstrate that the proposed approach significantly improves computational scalability—reducing runtime substantially—while maintaining high estimation accuracy and robust feature selection performance. This work establishes an efficient, automated modeling paradigm for large-scale discrete-time event history analysis.

Technology Category

Application Category

📝 Abstract

Discrete-time hazard models are widely used when event times are measured in intervals or are not precisely observed. While these models can be estimated using standard generalized linear model techniques, they rely on extensive data augmentation, making estimation computationally demanding in high-dimensional settings. In this paper, we demonstrate how the recently proposed Batchwise Backfitting algorithm, a general framework for scalable estimation and variable selection in distributional regression, can be effectively extended to discrete hazard models. Using both simulated data and a large-scale application on infant mortality in sub-Saharan Africa, we show that the algorithm delivers accurate estimates, automatically selects relevant predictors, and scales efficiently to large data sets. The findings underscore the algorithm's practical utility for analysing large-scale, complex survival data with high-dimensional covariates.

Problem

Research questions and friction points this paper is trying to address.

Efficient estimation for big discrete time-to-event data

Automated variable selection in high-dimensional survival models

Scalable algorithm for complex survival data analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends Batchwise Backfitting to discrete hazard models

Automatically selects relevant predictors efficiently

Scales effectively for large high-dimensional datasets

🔎 Similar Papers

No similar papers found.