🤖 AI Summary
Traditional one-to-many matching struggles to achieve fine-grained covariate balance in merged samples and often excessively discards control units. To address this, we propose an end-to-end variable-ratio matching algorithm—the first to depart from the conventional stratified matching paradigm. Built upon an optimal assignment framework with global optimization, it ensures exact, fine-grained balance across multiple covariates between treatment and control groups, implemented in the R package *match2C*. Unlike existing methods, it requires no pre-specified strata and directly enforces marginal distributional consistency across the full sample, substantially reducing control unit attrition. In both simulation studies and real-world healthcare data applications, our method outperforms mainstream alternatives—including CBPS and GenMatch—in multivariate balance quality and control retention rate, thereby enhancing the statistical efficiency and reliability of causal effect estimation.
📝 Abstract
Variable-ratio matching is a flexible alternative to conventional $1$-to-$k$ matching for designing observational studies that emulate a target randomized controlled trial (RCT). To achieve fine balance -- that is, matching treated and control groups to have the same marginal distribution on selected covariates -- conventional approaches typically partition the data into strata based on estimated entire numbers and then perform a series of $1$-to-$k$ matches within each stratum, with $k$ determined by the stratum-specific entire number. This ``divide-and-conquer" strategy has notable limitations: (1) fine balance typically does not hold in the final pooled sample, and (2) more controls may be discarded than necessary. To address these limitations, we propose a one-shot variable-ratio matching algorithm. Our method produces designs with exact fine balance on selected covariates in the matched sample, mimicking a hypothetical RCT where units are first grouped into sets of different sizes and one unit within each set is assigned to treatment while others to control. Moreover, our method achieves comparable or superior balance across many covariates and retains more controls in the final matched design, compared to the ``divide-and-conquer" approach. We demonstrate the advantages of the proposed design over the conventional approach via simulations and using a dataset studying the effect of right heart catheterization on mortality among critically ill patients. The algorithm is implemented in the R package match2C.