🤖 AI Summary
Traditional respondent-driven sampling (RDS) inference typically assumes random recruitment, thereby overlooking the reality that individuals differentially recruit peers based on multidimensional covariates—encompassing both categorical and continuous variables—which can induce estimation bias. This work proposes a multivariate differential recruitment (MDR) framework that, for the first time, jointly models the influence of multidimensional covariates on recruitment behavior and formalizes the RDS process as a Markov process dependent on node- or edge-level covariates. Building on this foundation, the authors extend prevalence estimation methods and integrate an improved neighborhood bootstrap procedure for variance estimation. Simulation studies demonstrate the robust performance of the proposed approach across diverse network structures and sampling configurations, and it is successfully applied to real-world RDS survey data collected from Venezuelan migrants in Santiago, Chile.
📝 Abstract
Respondent-Driven Sampling (RDS) is a chain-referral design used for collecting data from hidden or hard-to-reach populations through their social networks. In RDS, respondents recruit their peers from the population of interest. As such, inference with RDS data commonly relies on estimated sampling probabilities derived from specific recruitment assumptions. Early literature assumes random recruitment, which is often unrealistic because individuals may recruit based on their personal preferences. This behavior is known as Differential Recruitment (DR). Recent works have incorporated univariate categorical DR in the estimation procedures. The main objective of this paper is to introduce Multivariate Differential Recruitment (MDR), a framework that incorporates multiple simultaneous covariates, both categorical and continuous, into the sampling representation. We model RDS as a Markov process with transition probabilities that depend on continuous or categorical variables associated with nodes or their ties. We then extend various prevalence estimators to this multivariate framework and implement a slightly modified neighborhood bootstrap for variance estimation. The proposed methodology is assessed through simulation studies for a range of network and sampling features. It is applied to an RDS study conducted among the adult Venezuelan population living in the Metropolitan Region of Santiago, Chile.