Data fusion using weakly aligned sources

📅 2023-08-28
🏛️ Journal of the American Statistical Association
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of smooth finite-dimensional parameter estimation under weak alignment—where multi-source data exhibit partial,而非 perfect, correspondence and fully aligned samples are scarce—this paper proposes a novel semiparametric data fusion method. We establish, for the first time, the semiparametric efficiency bound under weak alignment and develop a theoretically grounded, robust estimator that jointly models alignment uncertainty and leverages auxiliary information, thereby substantially reducing reliance on fully aligned samples. Our approach relaxes the stringent strong-alignment assumption inherent in conventional fusion frameworks. Applied to an HIV monoclonal antibody prevention trial, it successfully quantifies the association between neutralizing antibodies and viral genotypes, demonstrating improved statistical efficiency and practical applicability. Key contributions include: (i) derivation of the semiparametric efficiency bound under weak alignment; (ii) a computationally feasible, robust fusion algorithm with provable efficiency; and (iii) interpretable, real-world validation in a clinical setting.
📝 Abstract
We introduce a new data fusion method that utilizes multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment is known up to finite-dimensional parameters. {We quantify the additional efficiency gains achieved through the integration of these weakly aligned sources. We characterize the semiparametric efficiency bound and provide a general means to construct estimators achieving these efficiency gains.} We illustrate our results by fusing data from two harmonized HIV monoclonal antibody prevention efficacy trials to study how a neutralizing antibody biomarker associates with HIV genotype.
Problem

Research questions and friction points this paper is trying to address.

Estimates smooth parameters using weakly aligned data sources
Addresses scarcity of fully aligned sources in data fusion
Quantifies efficiency gains from integrating misaligned sources
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes weakly aligned data sources
Estimates finite-dimensional parameters efficiently
Characterizes semiparametric efficiency bound
🔎 Similar Papers
No similar papers found.
Sijia Li
Sijia Li
Institute of Information Engineering, Chinese Academy of Sciences
P
P. Gilbert
Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Center
Rui Duan
Rui Duan
Harvard University
BiostatisticsBioinformaticsEpidemiologyElectronic Health Record
A
Alexander Luedtke
Department of Statistics, University of Washington