🤖 AI Summary
To address the poor scalability of conventional survival curve clustering methods—whose reliance on computationally intensive bootstrap resampling hinders application to large-scale survival data—this paper proposes a bootstrap-free, efficient clustering framework. The method fundamentally integrates k-means with the log-rank test: it introduces a novel distance metric grounded in the log-rank statistic, explicitly modeling differences among survival functions, and incorporates a statistically principled, adaptive strategy for determining the optimal number of clusters. By eliminating bootstrap entirely while preserving statistical validity, the approach achieves substantial computational speedup without sacrificing clustering accuracy. Empirical evaluation demonstrates that the proposed method attains clustering performance comparable to state-of-the-art bootstrap-based approaches, yet reduces runtime by over an order of magnitude. This work establishes a new paradigm for high-dimensional, large-scale survival analysis—one that reconciles theoretical rigor with practical deployability.
📝 Abstract
Survival analysis encompasses a broad range of methods for analyzing time-to-event data, with one key objective being the comparison of survival curves across groups. Traditional approaches for identifying clusters of survival curves often rely on computationally intensive bootstrap techniques to approximate the null hypothesis distribution. While effective, these methods impose significant computational burdens. In this work, we propose a novel approach that leverages the k-means and log-rank test to efficiently identify and cluster survival curves. Our method eliminates the need for computationally expensive resampling, significantly reducing processing time while maintaining statistical reliability. By systematically evaluating survival curves and determining optimal clusters, the proposed method ensures a practical and scalable alternative for large-scale survival data analysis. Through simulation studies, we demonstrate that our approach achieves results comparable to existing bootstrap-based clustering methods while dramatically improving computational efficiency. These findings suggest that the log-rank-based clustering procedure offers a viable and time-efficient solution for researchers working with multiple survival curves in medical and epidemiological studies.