C-kNN-LSH: A Nearest-Neighbor Algorithm for Sequential Counterfactual Inference

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the challenge of causal effect estimation in high-dimensional, confounded longitudinal data by proposing the C-kNN-LSH framework, which introduces locality-sensitive hashing (LSH) into longitudinal causal inference for the first time. The method efficiently matches “clinical twins”—patients with similar covariate histories—to estimate local conditional treatment effects under dynamic disease states. By integrating k-nearest neighbor matching with doubly robust correction, C-kNN-LSH effectively handles irregular sampling and heterogeneity in patient recovery patterns. The estimator is theoretically guaranteed to be consistent and second-order robust to perturbation errors. Evaluated on a cohort of 13,511 long COVID patients, the framework substantially outperforms existing baselines, demonstrating superior performance in modeling recovery heterogeneity and estimating the value of individualized treatment strategies.

Technology Category

Application Category

📝 Abstract

Estimating causal effects from longitudinal trajectories is central to understanding the progression of complex conditions and optimizing clinical decision-making, such as comorbidities and long COVID recovery. We introduce \emph{C-kNN--LSH}, a nearest-neighbor framework for sequential causal inference designed to handle such high-dimensional, confounded situations. By utilizing locality-sensitive hashing, we efficiently identify ``clinical twins''with similar covariate histories, enabling local estimation of conditional treatment effects across evolving disease states. To mitigate bias from irregular sampling and shifting patient recovery profiles, we integrate neighborhood estimator with a doubly-robust correction. Theoretical analysis guarantees our estimator is consistent and second-order robust to nuisance error. Evaluated on a real-world Long COVID cohort with 13,511 participants, \emph{C-kNN-LSH} demonstrates superior performance in capturing recovery heterogeneity and estimating policy values compared to existing baselines.

Problem

Research questions and friction points this paper is trying to address.

causal inference

longitudinal data

counterfactual estimation

treatment effect

confounding

Innovation

Methods, ideas, or system contributions that make the work stand out.

C-kNN-LSH

sequential causal inference

locality-sensitive hashing