M-estimation under Two-Phase Multiwave Sampling with Applications to Prediction-Powered Inference

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This study addresses the challenge of integrating low-cost, full-sample proxy variables—such as machine learning predictions—with a limited number of high-cost, high-quality observations in two-stage, multi-wave adaptive sampling. While such fusion can enhance estimation efficiency, it often introduces bias and complicates statistical inference. To overcome this, the paper proposes a “predict-and-debias” M-estimator that achieves both high efficiency and unbiasedness within the adaptive sampling framework. The authors establish the first asymptotic theory for M-estimation under this setting, proving that the estimator is asymptotically normal and yields asymptotically valid and efficient confidence intervals. An approximate greedy sampling strategy is also developed to optimize information acquisition. Both theoretical analysis and simulations demonstrate that the proposed method substantially improves estimation efficiency compared to uniform sampling.

Technology Category

Application Category

📝 Abstract

In two-phase multiwave sampling, inexpensive measurements are collected on a large sample and expensive, more informative measurements are adaptively obtained on subsets of units across multiple waves. Adaptively collecting the expensive measurements can increase efficiency but complicates statistical inference. We give valid estimators and confidence intervals for M-estimation under adaptive two-phase multiwave sampling. We focus on the case where proxies for the expensive variables -- such as predictions from pretrained machine learning models -- are available for all units and propose a Multiwave Predict-Then-Debias estimator that combines proxy information with the expensive, higher-quality measurements to improve efficiency while removing bias. We establish asymptotic linearity and normality and propose asymptotically valid confidence intervals. We also develop an approximately greedy sampling strategy that improves efficiency relative to uniform sampling. Data-based simulation studies support the theoretical results and demonstrate efficiency gains.

Problem

Research questions and friction points this paper is trying to address.

M-estimation

two-phase sampling

multiwave sampling

adaptive sampling

statistical inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

M-estimation

two-phase multiwave sampling

prediction-powered inference