🤖 AI Summary
This study addresses the bias in inequality measure estimation and the failure of statistical inference caused by nonignorable nonresponse—where response probabilities depend on unobserved outcomes—in survey data. To tackle this issue, the authors propose a semiparametric full-likelihood approach leveraging multiple callback data. By integrating the EM algorithm, logistic regression, and large-sample asymptotic theory, the method enables effective correction and asymptotically valid inference for inequality indices such as the Gini coefficient and Theil index, without imposing strong parametric assumptions on the outcome distribution. As the first work to combine callback data with a semiparametric full-likelihood framework, the proposed approach substantially reduces nonresponse bias in simulations, achieving efficiency close to that under complete response, and demonstrates practical utility in real-world consumer expenditure survey data.
📝 Abstract
This paper develops semiparametric methods for estimation and inference of widely used inequality measures when survey data are subject to nonignorable nonresponse, a challenging setting in which response probabilities depend on the unobserved outcomes. Such nonresponse mechanisms are common in household surveys and invalidate standard inference procedures due to selection bias and lack of population representativeness. We address this problem by exploiting callback data from repeated contact attempts and adopting a semiparametric model that leaves the outcome distribution unspecified. We construct semiparametric full-likelihood estimators for the underlying distribution and the associated inequality measures, and establish their large-sample properties for a broad class of functionals, including quantiles, the Theil index, and the Gini index. Explicit asymptotic variance expressions are derived, enabling valid Wald-type inference under nonignorable nonresponse. To facilitate implementation, we propose a stable and computationally convenient expectation-maximization algorithm, whose steps either admit closed-form expressions or reduce to fitting a standard logistic regression model. Simulation studies demonstrate that the proposed procedures effectively correct nonresponse bias and achieve near-benchmark efficiency. An application to Consumer Expenditure Survey data illustrates the practical gains from incorporating callback information when making inference on inequality measures.