π€ AI Summary
This study investigates the feasibility of sparse signal recovery from mixed-quality observations, comprising a small number of low-noise high-quality samples and a large number of high-noise low-quality samples. Leveraging information-theoretic lower bounds and LASSO analysis, it separately addresses scenarios where the decoder either knows or is unaware of the noise variance of each sample. The work introduces the novel concept of βquality cost,β revealing that information-theoretic limits are highly sensitive to data quality heterogeneity, whereas LASSO exhibits strong robustness by depending only on the average noise level. In the non-informed setting, it is shown that one high-quality sample is equivalent to at most two low-quality samples, and LASSO achieves the same recovery threshold as in the homogeneous-noise case, thereby establishing a linear trade-off between high- and low-quality samples.
π Abstract
We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for $(n_1, n_2)$ to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.