🤖 AI Summary
To address incomplete information due to right-censoring in survival analysis, this paper proposes a budget-constrained active learning framework that selects right-censored samples with maximal information gain for costly ground-truth event time acquisition. Methodologically, we design an information-gain-based sampling strategy tailored to survival data structures, enabling partial uncensoring and progressive label acquisition; its time complexity is theoretically shown to match that of BatchBALD. Our key contribution lies in the first unified formulation integrating active learning, survival modeling, and hard budget constraints—ensuring both theoretical interpretability and computational efficiency. Extensive experiments across multiple benchmark datasets demonstrate that our approach significantly improves predictive performance and robustness of survival models (e.g., Cox regression, DeepSurv), outperforming state-of-the-art active learning and survival analysis baselines.
📝 Abstract
Standard supervised learners attempt to learn a model from a labeled dataset. Given a small set of labeled instances, and a pool of unlabeled instances, a budgeted learner can use its given budget to pay to acquire the labels of some unlabeled instances, which it can then use to produce a model. Here, we explore budgeted learning in the context of survival datasets, which include (right) censored instances, where we know only a lower bound on an instance's time-to-event. Here, that learner can pay to (partially) label a censored instance -- e.g., to acquire the actual time for an instance [perhaps go from (3 yr, censored) to (7.2 yr, uncensored)], or other variants [e.g., learn about one more year, so go from (3 yr, censored) to either (4 yr, censored) or perhaps (3.2 yr, uncensored)]. This serves as a model of real world data collection, where follow-up with censored patients does not always lead to uncensoring, and how much information is given to the learner model during data collection is a function of the budget and the nature of the data itself. We provide both experimental and theoretical results for how to apply state-of-the-art budgeted learning algorithms to survival data and the respective limitations that exist in doing so. Our approach provides bounds and time complexity asymptotically equivalent to the standard active learning method BatchBALD. Moreover, empirical analysis on several survival tasks show that our model performs better than other potential approaches on several benchmarks.