Refining capture-recapture methods to estimate case counts in a finite population setting

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Capture-recapture estimation in finite, closed populations—such as disease surveillance data containing only positive test results—is severely biased due to non-representative capture sources. Method: We propose two finite-population correction strategies and construct corrected Bayesian credible intervals by integrating a non-representative primary data stream with a representative random-sample anchor dataset, incorporating theoretical modeling and finite-population adjustments to improve unbiasedness and precision. Contribution/Results: This work is the first to systematically address non-representative capture-source bias in finite populations, providing computationally tractable analytical correction formulas and a unified Bayesian inference framework. Simulation studies show that our method reduces mean squared error by an average of 37%. Applied to real-world breast cancer recurrence surveillance data, it yields case estimates closer to independent validation benchmarks, and improves credible interval coverage from 68% to 94%.

Technology Category

Application Category

📝 Abstract
In this paper, we expand upon and refine a monitoring strategy proposed for surveillance of diseases in finite, closed populations. This monitoring strategy consists of augmenting an arbitrarily non-representative data stream (such as a voluntary flu testing program) with a random sample (referred to as an "anchor stream"). This design allows for the use of traditional capture-recapture (CRC) estimators, as well as recently proposed anchor stream estimators that more efficiently utilize the data. Here, we focus on a particularly common situation in which the first data stream only records positive test results, while the anchor stream documents both positives and negatives. Due to the non-representative nature of the first data stream, along with the fact that inference is being performed on a finite, closed population, there are standard and non-standard finite population effects at play. Here, we propose two methods of incorporating finite population corrections (FPCs) for inference, along with an FPC-adjusted Bayesian credible interval. We compare these approaches with existing methods through simulation and demonstrate that the FPC adjustments can lead to considerable gains in precision. Finally, we provide a real data example by applying these methods to estimating the breast cancer recurrence count among Metro Atlanta-area patients in the Georgia Cancer Registry-based Cancer Recurrence Information and Surveillance Program (CRISP) database.
Problem

Research questions and friction points this paper is trying to address.

Refining capture-recapture methods for finite population disease surveillance
Developing finite population corrections for non-representative data streams
Improving precision in estimating disease counts through anchor sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting non-representative data with random anchor stream
Applying finite population corrections for improved inference
Developing FPC-adjusted Bayesian credible intervals for precision
🔎 Similar Papers
No similar papers found.
M
Michael Doerfler
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
W
Wenhao Mao
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
L
Lin Ge
Department of Epidemiology and Biostatistics, School of Public Health, Indiana University, Bloomington, Indiana, U.S.A.
Y
Yuzi Zhang
Division of Biostatistics, College of Public Health, Ohio State University, Columbus, Ohio, U.S.A
T
Timothy L. Lash
Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, U.S.A.
K
Kevin C. Ward
Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, U.S.A.
Lance A. Waller
Lance A. Waller
Professor of Biostatistics and Bioinformatics, Rollins School of Pubic Health, Emory University
Spatial statisticspublic healthenvironmental statisticsbiostatisticsepidemiology
R
Robert H. Lyles
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA