Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

This paper investigates the robust PAC learnability of sample-compressible distribution families under noise and adversarial perturbations. Specifically, it addresses two perturbation models: independent additive noise and adversarial contamination of an unknown subset of samples. For the first time, it establishes that sample compressibility preserves learnability under both models and derives necessary and sufficient conditions for robust PAC learnability in these settings. The authors introduce a novel “perturbation–quantization” analytical framework, which integrates sample compression schemes with robust statistical inference techniques to obtain tight upper bounds on sample complexity. This framework resolves the long-standing open problem of learning high-dimensional uniform and Gaussian mixture models under adversarial contamination. The results provide foundational theoretical guarantees and new algorithmic tools for robust density estimation and distribution learning.

Technology Category

Application Category

📝 Abstract

Learning distribution families over $mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to characterize its sample complexity. In 2018, Ashtiani et al. reframed emph{sample compressibility}, originally due to Littlestone and Warmuth (1986), as a structural property of distribution classes, proving that it guarantees PAC-learnability. This discovery subsequently enabled a series of recent advancements in deriving nearly tight sample complexity bounds for various high-dimensional open problems. It has been further conjectured that the converse also holds: every learnable class admits a tight sample compression scheme. In this work, we establish that sample compressible families remain learnable even from perturbed samples, subject to a set of necessary and sufficient conditions. We analyze two models of data perturbation: (i) an additive independent noise model, and (ii) an adversarial corruption model, where an adversary manipulates a limited subset of the samples unknown to the learner. Our results are general and rely on as minimal assumptions as possible. We develop a perturbation-quantization framework that interfaces naturally with the compression scheme and leads to sample complexity bounds that scale gracefully with the noise level and corruption budget. As concrete applications, we establish new sample complexity bounds for learning finite mixtures of high-dimensional uniform distributions under both noise and adversarial perturbations, as well as for learning Gaussian mixture models from adversarially corrupted samples, resolving two open problems in the literature.

Problem

Research questions and friction points this paper is trying to address.

Study learnability of sample-compressible distributions under noise

Analyze additive noise and adversarial corruption perturbation models

Establish sample complexity bounds for high-dimensional mixture models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample compressibility ensures learnability under perturbations

Perturbation-quantization framework for robust learning

Minimal assumptions for noise and adversarial models

🔎 Similar Papers

No similar papers found.