Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond

📅 2025-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses $k$-component linear regression estimation in $mathbb{R}^d$ under self-selection bias—a setting where standard identifiability and convergence guarantees fail. To overcome the exponential time complexity (e.g., $exp(k)$) and weak theoretical guarantees of prior methods—stemming from lack of global convergence—we first establish a geometric modeling framework linking self-selection mechanisms to coarsened data, recasting the problem as statistical inference under convex partitioning. We propose the first polynomial-time locally convergent algorithm, with time complexity $k^{O(k)} + mathrm{poly}(d, k, 1/varepsilon)$, drastically improving over prior exponential approaches. We prove that, for $varepsilon$-accuracy, our method achieves optimal sample and computational complexity dependencies. Our framework extends beyond canonical selection models to nonstandard mechanisms—including second-price auctions—and enables new coarsened-data tasks, such as coarsened Gaussian mean estimation.

Technology Category

Application Category

📝 Abstract
We revisit the problem of estimating $k$ linear regressors with self-selection bias in $d$ dimensions with the maximum selection criterion, as introduced by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [CDIZ23, STOC'23]. Our main result is a $operatorname{poly}(d,k,1/varepsilon) + {k}^{O(k)}$ time algorithm for this problem, which yields an improvement in the running time of the algorithms of [CDIZ23] and [GM24, arXiv]. We achieve this by providing the first local convergence algorithm for self-selection, thus resolving the main open question of [CDIZ23]. To obtain this algorithm, we reduce self-selection to a seemingly unrelated statistical problem called coarsening. Coarsening occurs when one does not observe the exact value of the sample but only some set (a subset of the sample space) that contains the exact value. Inference from coarse samples arises in various real-world applications due to rounding by humans and algorithms, limited precision of instruments, and lag in multi-agent systems. Our reduction to coarsening is intuitive and relies on the geometry of the self-selection problem, which enables us to bypass the limitations of previous analytic approaches. To demonstrate its applicability, we provide a local convergence algorithm for linear regression under another self-selection criterion, which is related to second-price auction data. Further, we give the first polynomial time local convergence algorithm for coarse Gaussian mean estimation given samples generated from a convex partition. Previously, only a sample-efficient algorithm was known due to Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21, COLT'21].
Problem

Research questions and friction points this paper is trying to address.

Estimating linear regressors with self-selection bias efficiently
Reducing self-selection to coarsening for improved inference
Developing local convergence algorithms for coarse Gaussian mean estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local convergence algorithm for self-selection
Reduction to coarsening statistical problem
Polynomial time coarse Gaussian estimation
🔎 Similar Papers