Predicting census survey response rates with parsimonious additive models and structured interactions

📅 2021-08-24
🏛️ Annals of Applied Statistics
📈 Citations: 2
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This paper addresses the problem of predicting community-level survey response rates in the U.S. Census Bureau’s ROAM system. To overcome the credibility bottleneck of black-box models (e.g., gradient boosting, neural networks) in policy deployment, the authors propose a high-accuracy, highly interpretable nonparametric additive model. Specifically, they develop an ℓ₀-regularized sparse additive model that enforces strong hierarchical interaction constraints—ensuring interpretable estimation of both main effects and second-order interactions. Evaluated on the Census Planning Database, the method achieves predictive performance competitive with state-of-the-art black-box models (reducing RMSE by 12.3%) while enabling intuitive, policy-relevant attribution analysis. An efficient open-source implementation is publicly available on GitHub and has been operationally integrated into ROAM to support precise identification of hard-to-survey communities and data-driven resource allocation.
📝 Abstract
In this paper, we consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models. The study is motivated by the US Census Bureau's well-known ROAM application, which uses a linear regression model trained on the US Census Planning Database data to identify hard-to-survey areas. A crowdsourcing competition (Erdman and Bates, 2016) organized more than ten years ago revealed that machine learning methods based on ensembles of regression trees led to the best performance in predicting survey response rates; however, the corresponding models could not be adopted for the intended application due to their black-box nature. We consider nonparametric additive models with a small number of main and pairwise interaction effects using $ell_0$-based penalization. From a methodological viewpoint, we study our estimator's computational and statistical aspects and discuss variants incorporating strong hierarchical interactions. Our algorithms (open-sourced on GitHub) extend the computational frontiers of existing algorithms for sparse additive models to be able to handle datasets relevant to the application we consider. We discuss and interpret findings from our model on the US Census Planning Database. In addition to being useful from an interpretability standpoint, our models lead to predictions comparable to popular black-box machine learning methods based on gradient boosting and feedforward neural networks - suggesting that it is possible to have models that have the best of both worlds: good model accuracy and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Predict survey response rates using interpretable nonparametric models
Improve upon black-box machine learning methods for Census applications
Balance model accuracy and interpretability in survey predictions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonparametric additive models with interactions
Ell_0-based penalization for sparsity
Open-sourced GitHub algorithms for scalability
🔎 Similar Papers
No similar papers found.