M-estimation under Two-Phase Multiwave Sampling with Applications to Prediction-Powered Inference

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of integrating low-cost, full-sample proxy variables—such as machine learning predictions—with a limited number of high-cost, high-quality observations in two-stage, multi-wave adaptive sampling. While such fusion can enhance estimation efficiency, it often introduces bias and complicates statistical inference. To overcome this, the paper proposes a “predict-and-debias” M-estimator that achieves both high efficiency and unbiasedness within the adaptive sampling framework. The authors establish the first asymptotic theory for M-estimation under this setting, proving that the estimator is asymptotically normal and yields asymptotically valid and efficient confidence intervals. An approximate greedy sampling strategy is also developed to optimize information acquisition. Both theoretical analysis and simulations demonstrate that the proposed method substantially improves estimation efficiency compared to uniform sampling.

Technology Category

Application Category

📝 Abstract
In two-phase multiwave sampling, inexpensive measurements are collected on a large sample and expensive, more informative measurements are adaptively obtained on subsets of units across multiple waves. Adaptively collecting the expensive measurements can increase efficiency but complicates statistical inference. We give valid estimators and confidence intervals for M-estimation under adaptive two-phase multiwave sampling. We focus on the case where proxies for the expensive variables -- such as predictions from pretrained machine learning models -- are available for all units and propose a Multiwave Predict-Then-Debias estimator that combines proxy information with the expensive, higher-quality measurements to improve efficiency while removing bias. We establish asymptotic linearity and normality and propose asymptotically valid confidence intervals. We also develop an approximately greedy sampling strategy that improves efficiency relative to uniform sampling. Data-based simulation studies support the theoretical results and demonstrate efficiency gains.
Problem

Research questions and friction points this paper is trying to address.

M-estimation
two-phase sampling
multiwave sampling
adaptive sampling
statistical inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

M-estimation
two-phase multiwave sampling
prediction-powered inference
debiasing
adaptive sampling
🔎 Similar Papers
No similar papers found.
D
Dan M. Kluger
Institute for Data, Systems, and Society, Massachusetts Institute of Technology; Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
Stephen Bates
Stephen Bates
Assistant Professor, MIT EECS
StatisticsMachine LearningArtificial IntelligenceUncertainty Quantification