Efficient and scalable clustering of survival curves

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor scalability of conventional survival curve clustering methods—whose reliance on computationally intensive bootstrap resampling hinders application to large-scale survival data—this paper proposes a bootstrap-free, efficient clustering framework. The method fundamentally integrates k-means with the log-rank test: it introduces a novel distance metric grounded in the log-rank statistic, explicitly modeling differences among survival functions, and incorporates a statistically principled, adaptive strategy for determining the optimal number of clusters. By eliminating bootstrap entirely while preserving statistical validity, the approach achieves substantial computational speedup without sacrificing clustering accuracy. Empirical evaluation demonstrates that the proposed method attains clustering performance comparable to state-of-the-art bootstrap-based approaches, yet reduces runtime by over an order of magnitude. This work establishes a new paradigm for high-dimensional, large-scale survival analysis—one that reconciles theoretical rigor with practical deployability.

Technology Category

Application Category

📝 Abstract
Survival analysis encompasses a broad range of methods for analyzing time-to-event data, with one key objective being the comparison of survival curves across groups. Traditional approaches for identifying clusters of survival curves often rely on computationally intensive bootstrap techniques to approximate the null hypothesis distribution. While effective, these methods impose significant computational burdens. In this work, we propose a novel approach that leverages the k-means and log-rank test to efficiently identify and cluster survival curves. Our method eliminates the need for computationally expensive resampling, significantly reducing processing time while maintaining statistical reliability. By systematically evaluating survival curves and determining optimal clusters, the proposed method ensures a practical and scalable alternative for large-scale survival data analysis. Through simulation studies, we demonstrate that our approach achieves results comparable to existing bootstrap-based clustering methods while dramatically improving computational efficiency. These findings suggest that the log-rank-based clustering procedure offers a viable and time-efficient solution for researchers working with multiple survival curves in medical and epidemiological studies.
Problem

Research questions and friction points this paper is trying to address.

Efficiently clusters survival curves without resampling
Reduces computational burden in survival data analysis
Provides scalable alternative for large-scale survival studies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses k-means and log-rank test for clustering
Eliminates computationally expensive resampling techniques
Provides scalable solution for large-scale survival data
🔎 Similar Papers
No similar papers found.
N
Nora M. Villanueva
Galician Centre for Mathematical Research and Technology (CITMAga), Santiago de Compostela (Spain); Universidade de Vigo, Dep. of Statistics and O.R. & SiDOR Group, 36310 Vigo (Spain)
Marta Sestelo
Marta Sestelo
Dep. Statistics and O.R, University of Vigo
Statistics and Data analysisMachine LearningSoftware DevelopmentR
L
Luis Meira-Machado
Centre of Mathematics, Universidade de Minho, Guimaraes, Portugal