π€ AI Summary
In individual-level observational studies, the synthetic control (SC) method suffers from the curse of dimensionality and degraded performance due to high-dimensional donor pools. To address this, we propose ClusterSCβa clustering-driven donor selection framework that integrates unsupervised clustering (e.g., K-means or spectral clustering) directly into the SC pipeline: donor units are first partitioned into clusters, and for each treated unit, donors are selected and weighted exclusively from its nearest cluster. We derive an improved theoretical bound on estimation error and incorporate cross-validation to optimize cluster number and donor composition. Extensive experiments on multiple synthetic benchmarks and real-world healthcare and economic datasets demonstrate that ClusterSC reduces average causal effect estimation error by 22β38% over standard SC. Moreover, it significantly enhances scalability, robustness to noise and heterogeneity, and out-of-sample generalization capability.
π Abstract
In causal inference with observational studies, synthetic control (SC) has emerged as a prominent tool. SC has traditionally been applied to aggregate-level datasets, but more recent work has extended its use to individual-level data. As they contain a greater number of observed units, this shift introduces the curse of dimensionality to SC. To address this, we propose Cluster Synthetic Control (ClusterSC), based on the idea that groups of individuals may exist where behavior aligns internally but diverges between groups. ClusterSC incorporates a clustering step to select only the relevant donors for the target. We provide theoretical guarantees on the improvements induced by ClusterSC, supported by empirical demonstrations on synthetic and real-world datasets. The results indicate that ClusterSC consistently outperforms classical SC approaches.