🤖 AI Summary
To address the challenge of modeling sparse count data—particularly zero-reported cases arising from historically zero incidence—in infectious disease forecasting, this paper proposes the Poisson Hierarchical Indian Buffet Process (PHIBP), a Bayesian nonparametric model grounded in absolute abundance rather than relative rates. Unlike conventional approaches that model incidence rates, PHIBP mitigates sensitivity to zero counts by enabling statistical strength sharing across regions. Crucially, it is the first framework to integrate ecological α- and β-diversity metrics—originally developed for microbial community analysis—into epidemiological prediction, thereby unifying statistical modeling with biological interpretability. Evaluated on real-world infectious disease surveillance data, PHIBP significantly improves outbreak prediction accuracy in zero-case regions, yields interpretable probabilistic forecasts, and supports robust epidemiological comparative analyses—including quantification of transmission heterogeneity and regional diversity assessment. This work establishes a novel paradigm for sparse-data epidemiological monitoring.
📝 Abstract
Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious disease prediction, with a focus on predicting outbreaks in geographic regions that have historically reported zero cases. To this end, we present the detailed computational framework and experimental application of the Poisson Hierarchical Indian Buffet Process (PHIBP), with demonstrated success in handling sparse count data in microbiome and ecological studies. The PHIBP's architecture, grounded in the concept of absolute abundance, systematically borrows statistical strength from related regions and circumvents the known sensitivities of relative-rate methods to zero counts. Through a series of experiments on infectious disease data, we show that this principled approach provides a robust foundation for generating coherent predictive distributions and for the effective use of comparative measures such as alpha and beta diversity. The chapter's emphasis on algorithmic implementation and experimental results confirms that this unified framework delivers both accurate outbreak predictions and meaningful epidemiological insights in data-sparse settings.