Statistical method for pooling categorical biomarkers from multi-center matched/nested case-control studies

📅 2025-05-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multicenter matched/nested case-control studies, categorical biomarkers are prone to regression estimation bias due to inter-center measurement heterogeneity, assay platform differences, and laboratory variability. To address this, we propose a likelihood-based calibration-integration method that jointly models the biomarker–disease association while explicitly correcting for center-specific measurement error. Our approach innovatively embeds the calibration step directly within the primary statistical model and employs sandwich-type robust variance estimation, ensuring asymptotically unbiased parameter inference and nominal coverage under multiple sources of uncertainty. Simulation studies demonstrate consistently strong statistical properties—including bias reduction, efficiency, and valid inference—across varying sample sizes and effect magnitudes. We validate the method using real multicenter data on vitamin D and colorectal cancer, confirming its practical feasibility and robustness in complex epidemiological settings.

Technology Category

Application Category

📝 Abstract
Pooled analyses that aggregate data from multiple studies are becoming increasingly common in collaborative epidemiologic research in order to increase the size and diversity of the study population. However, biomarker measurements from different studies are subject to systematic measurement errors and directly pooling them for analyses may lead to biased estimates of the regression parameters. Therefore, study-specific calibration processes must be incorporated in the statistical analyses to address between-study/assay/laboratory variability in the biomarker measurements. We propose a likelihood-based method to evaluate biomarker-disease relationships for categorical biomarkers in matched/nested case-control studies. To account for the additional uncertainties from the calibration processes, we propose a sandwich variance estimator to obtain valid asymptotic variances of the estimated regression parameters. Extensive simulation studies with varying sample sizes and biomarker-disease associations are used to evaluate the finite sample performance of our proposed methods. As an illustration, we apply the methods to a vitamin D pooling project of colorectal cancer to evaluate the effect of categorical vitamin D levels on colorectal cancer risks.
Problem

Research questions and friction points this paper is trying to address.

Addressing systematic errors in multi-center biomarker pooling
Developing calibration methods for between-study assay variability
Evaluating categorical biomarker-disease relationships in case-control studies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Likelihood-based method for categorical biomarkers
Sandwich variance estimator for calibration uncertainties
Study-specific calibration for between-study variability
Yujie Wu
Yujie Wu
Assistant Professor, The Hong Kong Polytechnic University
Brain-inspired AIComputational neuroscienceNeuromorphic computing
X
Xiao Wu
Department of Biostatistics, Columbia Mailman School of Public Health, New York, NY
M
Mitchell H. Gail
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD
R
Regina G. Ziegler
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD
S
S. Smith-Warner
Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA; Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA
Molin Wang
Molin Wang
Harvard School of Public Health
BiostatisticsEpidemiological methods