🤖 AI Summary
This study addresses the challenge of accurate probabilistic nowcasting of SARS-CoV-2 variant proportions amid heterogeneous regional data availability and divergent temporal trends. To this end, the authors propose a hierarchical multinomial logistic regression (HMLR) modeling framework that leverages cross-regional information sharing to generate short-term probabilistic forecasts. In retrospective weekly evaluations conducted through the US SARS-CoV-2 Variant Nowcast Hub, all twelve implemented HMLR models significantly outperformed baseline approaches in both probabilistic and point prediction accuracy, with particularly strong performance in regions with abundant data. The work systematically elucidates the interplay between model complexity and regional data volume, demonstrating robustness through energy scores and Brier scores, thereby offering public health decision-makers an efficient and reliable forecasting tool.
📝 Abstract
Nowcasting and forecasting of infectious diseases have become increasingly important since the SARS-CoV-2 pandemic. In particular, methods for modeling the composition of circulating variants at a given time have seen more use in part due to a large increase in the frequency of genomic sequencing conducted as a part of routine surveillance. However, methods must take into account that locations have different amounts of data and sometimes have different trends. We discuss hierarchical multinomial logistic regression (HMLR), a commonly used method for forecasting SARS-CoV-2 variants, which allows for data sharing across locations. We show how it has been used in the literature, and define a class of HMLR models for SARS-CoV-2 variant nowcasting and forecasting. We rigorously test a subset of this class of models using the framework of the US SARS-CoV-2 Variant Nowcast Hub, a collaborative modeling project that launched in 2024. We created two years of weekly predictions based on retrospective datasets, with the prediction dates ranging from Wednesday, August 3, 2022, to Wednesday, August 7, 2024. We tested 12 HMLR models against a baseline model on these datasets. We found that the HMLR models outperformed the baseline both in terms of probabilistic accuracy, as measured by the energy score, as well as point accuracy, as measured by the Brier score. Overall, we find that HMLR models perform best with respect to the baseline model in locations with more data, and more complex HMLR models also showed more improvement in those high-data locations; however, there was no one best model across all metrics, and simpler HMLR models perform better in low-data locations. We find that HMLR models perform well in practice for nowcasting and forecasting SARS-CoV-2 variants.