Attainability of Two-Point Testing Rates for Finite-Sample Location Estimation

📅 2025-02-09

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This paper addresses the achievability of two-sample testing lower bounds—expressed via the Hellinger modulus of continuity—in univariate location estimation, under both known and unknown distribution families (i.e., standard and adaptive location estimation). We propose the first near-linear-time, parameter-free algorithm that achieves estimation error within a polylogarithmic factor of the Hellinger modulus over symmetric log-concave mixture distributions: specifically, $widetilde{Theta}(1/n)$ accuracy from $widetilde{Omega}(n)$ samples, thereby nearly attaining the two-sample Le Cam lower bound for the first time. We further prove that this rate is unattainable for symmetric unimodal distributions but remains achievable for a broader class of unimodal distributions. Our results resolve a long-standing tension in adaptive robust estimation between statistical rate optimality and computational feasibility, significantly improving upon the sub-Gaussian benchmark rate of $Theta(1/sqrt{n})$—e.g., achieving $Theta(1/n)$ for $ ext{Uniform}(mu-1,mu+1)$.

Technology Category

Application Category

📝 Abstract

LeCam's two-point testing method yields perhaps the simplest lower bound for estimating the mean of a distribution: roughly, if it is impossible to well-distinguish a distribution centered at $mu$ from the same distribution centered at $mu+Delta$, then it is impossible to estimate the mean by better than $Delta/2$. It is setting-dependent whether or not a nearly matching upper bound is attainable. We study the conditions under which the two-point testing lower bound can be attained for univariate mean estimation; both in the setting of location estimation (where the distribution is known up to translation) and adaptive location estimation (unknown distribution). Roughly, we will say an estimate nearly attains the two-point testing lower bound if it incurs error that is at most polylogarithmically larger than the Hellinger modulus of continuity for $ ilde{Omega}(n)$ samples. Adaptive location estimation is particularly interesting as some distributions admit much better guarantees than sub-Gaussian rates (e.g. $operatorname{Unif}(mu-1,mu+1)$ permits error $Theta(frac{1}{n})$, while the sub-Gaussian rate is $Theta(frac{1}{sqrt{n}})$), yet it is not obvious whether these rates may be adaptively attained by one unified approach. Our main result designs an algorithm that nearly attains the two-point testing rate for mixtures of symmetric, log-concave distributions with a common mean. Moreover, this algorithm runs in near-linear time and is parameter-free. In contrast, we show the two-point testing rate is not nearly attainable even for symmetric, unimodal distributions. We complement this with results for location estimation, showing the two-point testing rate is nearly attainable for unimodal distributions, but unattainable for symmetric distributions.

Problem

Research questions and friction points this paper is trying to address.

Conditions for attaining two-point testing lower bound

Adaptive location estimation for unknown distributions

Algorithm design for symmetric, log-concave distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-point testing method

Adaptive location estimation

Near-linear time algorithm

🔎 Similar Papers

A New Upper Bound for Distributed Hypothesis Testing Using the Auxiliary Receiver Approach