Survival analysis under label shift

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses cross-distribution survival inference under concurrent label shift and random censoring: the source population (P) provides censored survival times (T) and covariates (Z), while the target population (Q) contains only (Z); the marginal distributions of (T) differ between (P) and (Q), but the conditional distribution (Z mid T) remains invariant. To this end, we introduce label shift modeling to survival analysis for the first time and propose a unified parametric inference framework compatible with classical models—including Cox proportional hazards and accelerated failure time models. Our method integrates parametric modeling with nonparametric estimation of censoring-adjusted importance weights, performing estimation via maximization of a weighted approximate likelihood. We establish rigorous consistency and asymptotic normality of the resulting estimator. Extensive simulations and real-data analyses demonstrate strong validity and robustness across diverse censoring rates and label shift magnitudes, offering a novel paradigm at the intersection of survival analysis and distributional shift.

Technology Category

Application Category

📝 Abstract
Let P represent the source population with complete data, containing covariate $mathbf{Z}$ and response $T$, and Q the target population, where only the covariate $mathbf{Z}$ is available. We consider a setting with both label shift and label censoring. Label shift assumes that the marginal distribution of $T$ differs between $P$ and $Q$, while the conditional distribution of $mathbf{Z}$ given $T$ remains the same. Label censoring refers to the case where the response $T$ in $P$ is subject to random censoring. Our goal is to leverage information from the label-shifted and label-censored source population $P$ to conduct statistical inference in the target population $Q$. We propose a parametric model for $T$ given $mathbf{Z}$ in $Q$ and estimate the model parameters by maximizing an approximate likelihood. This allows for statistical inference in $Q$ and accommodates a range of classical survival models. Under the label shift assumption, the likelihood depends not only on the unknown parameters but also on the unknown distribution of $T$ in $P$ and $mathbf{Z}$ in $Q$, which we estimate nonparametrically. The asymptotic properties of the estimator are rigorously established and the effectiveness of the method is demonstrated through simulations and a real data application. This work is the first to combine survival analysis with label shift, offering a new research direction in this emerging topic.
Problem

Research questions and friction points this paper is trying to address.

Survival analysis under label shift and censoring
Infer target population Q using shifted source P
Estimate model parameters via approximate likelihood maximization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parametric model for T given Z in Q
Maximize approximate likelihood for estimation
Nonparametric estimation of unknown distributions
🔎 Similar Papers
No similar papers found.
Y
Yuxiang Zong
Research Centre for Operations Research and Statistics, KU Leuven, Naamsestraat 69, 3000 Leuven, Belgium
Y
Yanyuan Ma
Department of Statistics, Pennsylvania State University, University Park, PA 16802, USA
Ingrid Van Keilegom
Ingrid Van Keilegom
KU Leuven
Survival analysisSemi- and nonparametric regressionMeasurement errors