๐ค AI Summary
This work investigates the identifiability and efficient estimation of the mean of a high-dimensional Gaussian distribution when only coarse-grained convex set memberships of samples are observableโsuch as due to rounding or sensor limitations. By integrating tools from convex geometry, statistical identifiability theory, and optimization, the paper establishes the first complete characterization of necessary and sufficient conditions under which the mean is identifiable. Building on this characterization, it proposes the first estimator that is both computationally efficient (running in polynomial time) and statistically sample-efficient, thereby resolving two long-standing theoretical challenges in this observational setting.
๐ Abstract
Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse samples, roughly speaking, have ``low''information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21] established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open: (1) When is the mean identifiable under convex partitions? (2) Is computationally efficient estimation possible under identifiability and convex partitions? This work resolves both questions. [...]