Semisupervised score based matching algorithm to evaluate the effect of public health interventions

📅 2024-03-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In observational studies, conventional matching methods suffer from bias due to heterogeneity in covariate importance across units. To address this, we propose SCOTOMA—a semi-supervised one-to-one matching framework. SCOTOMA jointly leverages a small set of expert-annotated matched pairs and abundant unlabeled (unmatched) data to learn an interpretable quadratic scoring function that explicitly estimates heterogeneous, covariate-specific weights. We establish theoretical consistency of the weight estimator and design an efficient matching search algorithm integrating consistency regularization and a simulated-annealing-inspired heuristic. Empirical evaluation demonstrates that SCOTOMA significantly outperforms mainstream methods—including Propensity Score Matching (PSM) and Covariate Balancing Propensity Score (CBPS)—on synthetic benchmarks. In a real-world application, SCOTOMA successfully estimated the causal effect of in-person instruction on community-level COVID-19 transmission rates, delivering actionable causal evidence for public health policy.

Technology Category

Application Category

📝 Abstract
Multivariate matching algorithms "pair" similar study units in an observational study to remove potential bias and confounding effects caused by the absence of randomizations. In one-to-one multivariate matching algorithms, a large number of "pairs" to be matched could mean both the information from a large sample and a large number of tasks, and therefore, to best match the pairs, such a matching algorithm with efficiency and comparatively limited auxiliary matching knowledge provided through a "training" set of paired units by domain experts, is practically intriguing. We proposed a novel one-to-one matching algorithm based on a quadratic score function $S_{eta}(x_i,x_j)= eta^T (x_i-x_j)(x_i-x_j)^T eta$. The weights $eta$, which can be interpreted as a variable importance measure, are designed to minimize the score difference between paired training units while maximizing the score difference between unpaired training units. Further, in the typical but intricate case where the training set is much smaller than the unpaired set, we propose a underline{s}emisupervised underline{c}ompanion underline{o}ne-underline{t}o-underline{o}ne underline{m}atching underline{a}lgorithm (SCOTOMA) that makes the best use of the unpaired units. The proposed weight estimator is proved to be consistent when the truth matching criterion is indeed the quadratic score function. When the model assumptions are violated, we demonstrate that the proposed algorithm still outperforms some popular competing matching algorithms through a series of simulations. We applied the proposed algorithm to a real-world study to investigate the effect of in-person schooling on community Covid-19 transmission rate for policy making purpose.
Problem

Research questions and friction points this paper is trying to address.

Learning covariate importance for matching in observational studies
Addressing poor performance when covariates differ in relevance
Incorporating expert insight into policy-relevant research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-supervised framework learning covariate importance
Optimizes weighted quadratic score reflecting covariate relevance
Model-free algorithm with consistent learned weights
🔎 Similar Papers
No similar papers found.
Hongzhe Zhang
Hongzhe Zhang
PhD Candidate, University of Pennsylvania
Statistics
J
Jiasheng Shi
School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), China
J
Jing Huang
Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA