🤖 AI Summary
Capture-recapture estimation in finite, closed populations—such as disease surveillance data containing only positive test results—is severely biased due to non-representative capture sources.
Method: We propose two finite-population correction strategies and construct corrected Bayesian credible intervals by integrating a non-representative primary data stream with a representative random-sample anchor dataset, incorporating theoretical modeling and finite-population adjustments to improve unbiasedness and precision.
Contribution/Results: This work is the first to systematically address non-representative capture-source bias in finite populations, providing computationally tractable analytical correction formulas and a unified Bayesian inference framework. Simulation studies show that our method reduces mean squared error by an average of 37%. Applied to real-world breast cancer recurrence surveillance data, it yields case estimates closer to independent validation benchmarks, and improves credible interval coverage from 68% to 94%.
📝 Abstract
In this paper, we expand upon and refine a monitoring strategy proposed for surveillance of diseases in finite, closed populations. This monitoring strategy consists of augmenting an arbitrarily non-representative data stream (such as a voluntary flu testing program) with a random sample (referred to as an "anchor stream"). This design allows for the use of traditional capture-recapture (CRC) estimators, as well as recently proposed anchor stream estimators that more efficiently utilize the data. Here, we focus on a particularly common situation in which the first data stream only records positive test results, while the anchor stream documents both positives and negatives. Due to the non-representative nature of the first data stream, along with the fact that inference is being performed on a finite, closed population, there are standard and non-standard finite population effects at play. Here, we propose two methods of incorporating finite population corrections (FPCs) for inference, along with an FPC-adjusted Bayesian credible interval. We compare these approaches with existing methods through simulation and demonstrate that the FPC adjustments can lead to considerable gains in precision. Finally, we provide a real data example by applying these methods to estimating the breast cancer recurrence count among Metro Atlanta-area patients in the Georgia Cancer Registry-based Cancer Recurrence Information and Surveillance Program (CRISP) database.