🤖 AI Summary
Existing KM curve digitization methods suffer from three key limitations: coordinate extraction errors, unrealistic assumptions of uniform censoring, and inability to reconstruct individual patient data (IPD) for subgroups from summary statistics. This paper introduces the first uncertainty-aware, subgroup-level IPD reconstruction framework. It integrates VEC-KM—enabling high-precision coordinate extraction—with CEN-KM—which corrects for non-uniform censoring—and innovatively incorporates the MAPLE algorithm to perform marginal-constrained subgroup label probability inference and evidence propagation, yielding statistically feasible multi-label sets. Evaluated on four phase III esophageal squamous cell carcinoma trials, the method significantly improves accuracy and reproducibility of treatment effect estimation in the PD-L1 low-expression subgroup. It establishes a new paradigm for IPD reconstruction in precision oncology: high-fidelity, interpretable, and quantitatively calibrated for uncertainty.
📝 Abstract
Individual patient data (IPD) from oncology trials are essential for reliable evidence synthesis but are rarely publicly available, necessitating reconstruction from published Kaplan-Meier (KM) curves. Existing reconstruction methods suffer from digitization errors, unrealistic uniform censoring assumptions, and the inability to recover subgroup-level IPD when only aggregate statistics are available. We developed RESOLVE-IPD, a unified computational framework that enables high-fidelity IPD reconstruction and uncertainty-aware subgroup meta-analysis to address these limitations. RESOLVE-IPD comprises two components. The first component, High-Fidelity IPD Reconstruction, integrates the VEC-KM and CEN-KM modules: VEC-KM extracts precise KM coordinates and explicit censoring marks from vectorized figures, minimizing digitization error, while CEN-KM corrects overlapping censor symbols and eliminates the uniform censoring assumption. The second component, Uncertainty-Aware Subgroup Recovery, employs the MAPLE (Marginal Assignment of Plausible Labels and Evidence Propagation) algorithm to infer patient-level subgroup labels consistent with published summary statistics (e.g., hazard ratio, median overall survival) when subgroup KM curves are unavailable. MAPLE generates ensembles of mathematically valid labelings, facilitating a propagating meta-analysis that quantifies and reflects uncertainty from subgroup reconstruction. RESOLVE-IPD was validated through a subgroup meta-analysis of four trials in advanced esophageal squamous cell carcinoma, focusing on the programmed death ligand 1 (PD-L1)-low population. RESOLVE-IPD enables accurate IPD reconstruction and robust, uncertainty-aware subgroup meta-analyses, strengthening the reliability and transparency of secondary evidence synthesis in precision oncology.