Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the robust PAC learnability of sample-compressible distribution families under noise and adversarial perturbations. Specifically, it addresses two perturbation models: independent additive noise and adversarial contamination of an unknown subset of samples. For the first time, it establishes that sample compressibility preserves learnability under both models and derives necessary and sufficient conditions for robust PAC learnability in these settings. The authors introduce a novel “perturbation–quantization” analytical framework, which integrates sample compression schemes with robust statistical inference techniques to obtain tight upper bounds on sample complexity. This framework resolves the long-standing open problem of learning high-dimensional uniform and Gaussian mixture models under adversarial contamination. The results provide foundational theoretical guarantees and new algorithmic tools for robust density estimation and distribution learning.

Technology Category

Application Category

📝 Abstract
Learning distribution families over $mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at least) information-theoretically learnable and, if so, to characterize its sample complexity. In 2018, Ashtiani et al. reframed emph{sample compressibility}, originally due to Littlestone and Warmuth (1986), as a structural property of distribution classes, proving that it guarantees PAC-learnability. This discovery subsequently enabled a series of recent advancements in deriving nearly tight sample complexity bounds for various high-dimensional open problems. It has been further conjectured that the converse also holds: every learnable class admits a tight sample compression scheme. In this work, we establish that sample compressible families remain learnable even from perturbed samples, subject to a set of necessary and sufficient conditions. We analyze two models of data perturbation: (i) an additive independent noise model, and (ii) an adversarial corruption model, where an adversary manipulates a limited subset of the samples unknown to the learner. Our results are general and rely on as minimal assumptions as possible. We develop a perturbation-quantization framework that interfaces naturally with the compression scheme and leads to sample complexity bounds that scale gracefully with the noise level and corruption budget. As concrete applications, we establish new sample complexity bounds for learning finite mixtures of high-dimensional uniform distributions under both noise and adversarial perturbations, as well as for learning Gaussian mixture models from adversarially corrupted samples, resolving two open problems in the literature.
Problem

Research questions and friction points this paper is trying to address.

Study learnability of sample-compressible distributions under noise
Analyze additive noise and adversarial corruption perturbation models
Establish sample complexity bounds for high-dimensional mixture models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample compressibility ensures learnability under perturbations
Perturbation-quantization framework for robust learning
Minimal assumptions for noise and adversarial models
🔎 Similar Papers
No similar papers found.
A
Arefe Boushehrian
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Amir Najafi
Amir Najafi
imec, Belgium
SOC designUltra-low-power on-chip communicationEnergy-efficient architectures