🤖 AI Summary
This study addresses the lack of valid inferential methods for the win ratio statistic when applied to composite endpoints in cluster randomized trials. It proposes a unified inference framework tailored to such designs and systematically evaluates the performance of the win ratio, win odds, net benefit, and DOOR metrics under Wald tests, permutation tests, jackknife variance estimation, and likelihood ratio tests. Through extensive simulations and an empirical analysis of the STRIDE trial, the authors assess type I error control and statistical power of these approaches in finite samples, leveraging cluster rank-sum statistics, bivariate clustered U-statistics, and both analytical and jackknife-based variance estimators. The proposed methods are implemented in the WinsCRT R package, offering researchers a reproducible toolkit for practical application.
📝 Abstract
Win statistics have become increasingly popular for analyzing hierarchical composite endpoints in clinical trials, because they summarize treatment benefit through pairwise comparisons that respect the clinical importance order among outcome components. The win ratio, win odds, net benefit, and desirability of outcome ranking (DOOR) are all based on the same underlying pairwise comparison methodology and can complement one another to show the strength of the treatment effect. Despite recent progress on win statistics, statistical inference for win statistics in cluster randomized trials (CRTs) remains underdeveloped. In this paper, we provide a comprehensive survey of testing procedures for the win ratio, win odds, net benefit, and DOOR in parallel-arm CRTs with hierarchical composite outcomes. Then based on each win statistic, we compare different testing procedures, including Wald tests based on cluster rank sum statistics and bivariate clustered U-statistics, tests that use a cluster jackknife variance, a score permutation test, a permutation based procedure with analytical variance estimation, and likelihood ratio test derived from clustered jackknife estimates. Through simulation studies that consider varying scenarios such as different cluster sizes, intracluster correlations, and censoring-induced ties, we characterize the finite-sample type I error and power of each procedure across a range of practical settings with small and large numbers of clusters.We illustrate our methods by reanalyzing the Strategies to Reduce Injuries and Develop Confidence in Elders (STRIDE) pragmatic CRT, and implement all win statistics methods in the WinsCRT R package.