Budget-constrained Active Learning to Effectively De-censor Survival Data

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address incomplete information due to right-censoring in survival analysis, this paper proposes a budget-constrained active learning framework that selects right-censored samples with maximal information gain for costly ground-truth event time acquisition. Methodologically, we design an information-gain-based sampling strategy tailored to survival data structures, enabling partial uncensoring and progressive label acquisition; its time complexity is theoretically shown to match that of BatchBALD. Our key contribution lies in the first unified formulation integrating active learning, survival modeling, and hard budget constraints—ensuring both theoretical interpretability and computational efficiency. Extensive experiments across multiple benchmark datasets demonstrate that our approach significantly improves predictive performance and robustness of survival models (e.g., Cox regression, DeepSurv), outperforming state-of-the-art active learning and survival analysis baselines.

Technology Category

Application Category

📝 Abstract
Standard supervised learners attempt to learn a model from a labeled dataset. Given a small set of labeled instances, and a pool of unlabeled instances, a budgeted learner can use its given budget to pay to acquire the labels of some unlabeled instances, which it can then use to produce a model. Here, we explore budgeted learning in the context of survival datasets, which include (right) censored instances, where we know only a lower bound on an instance's time-to-event. Here, that learner can pay to (partially) label a censored instance -- e.g., to acquire the actual time for an instance [perhaps go from (3 yr, censored) to (7.2 yr, uncensored)], or other variants [e.g., learn about one more year, so go from (3 yr, censored) to either (4 yr, censored) or perhaps (3.2 yr, uncensored)]. This serves as a model of real world data collection, where follow-up with censored patients does not always lead to uncensoring, and how much information is given to the learner model during data collection is a function of the budget and the nature of the data itself. We provide both experimental and theoretical results for how to apply state-of-the-art budgeted learning algorithms to survival data and the respective limitations that exist in doing so. Our approach provides bounds and time complexity asymptotically equivalent to the standard active learning method BatchBALD. Moreover, empirical analysis on several survival tasks show that our model performs better than other potential approaches on several benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Developing budget-constrained active learning for survival data with censored instances
Acquiring partial labels for censored instances to improve time-to-event predictions
Providing theoretical and experimental analysis of budgeted learning limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active learning with budget constraints for survival data
Partially labeling censored instances to improve accuracy
Theoretical and empirical validation of time complexity bounds
🔎 Similar Papers
No similar papers found.