Certified Robustness from Approximate Gaussian Mixture Structures in Pretrained Latent Spaces

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing methods for certifiable robustness are often overly conservative due to their neglect of underlying data distribution structures. This work observes that pretrained encoders induce latent representations that approximately follow a Gaussian mixture structure, and leverages this insight to construct classifiers with closed-form robustness certificates. Theoretically, it establishes that certifiable robustness guarantees can still be obtained even when the latent distribution is only a KL-divergence approximation of a Gaussian mixture—marking the first approach to eliminate reliance on exact distributional assumptions. By integrating KL-divergence analysis, Gaussian mixture modeling, and pretrained representations, the proposed method achieves state-of-the-art or competitive certified accuracy on CIFAR-10 and ImageNet, while maintaining high clean accuracy and low computational overhead.

📝 Abstract

Deep learning models are vulnerable to adversarial perturbations, raising important concerns for safety-critical deployment. Empirical defenses can achieve strong robustness in practice, but lack formal guarantees, motivating the need for certifiably robust classifiers. While certified methods provide formal guarantees, they often yield overly conservative bounds due to their inability to exploit structure in complex data distributions. In this work, we propose a framework for designing certifiably robust classifiers that leverages latent structure in data representations. We first analyze the Gaussian mixture setting, deriving necessary and sufficient conditions for the existence of robust classifiers and constructing a classifier with a closed-form robustness certificate and generalization guarantees. Our main contribution is to show that exact structure is not required: we prove that if a pretrained encoder maps inputs to a latent distribution that is $\varepsilon$-close (in KL divergence) to a Gaussian mixture, then certified accuracy degrades gracefully, with an explicit bound relating robustness under the true and approximate distributions. This result enables the direct use of pretrained models without requiring exact distributional assumptions. Empirically, our method achieves state-of-the-art or competitive certified accuracy on CIFAR-10 and ImageNet, while maintaining strong clean performance and low computational overhead. Overall, our work establishes approximate latent structure as a practical and principled route to certifiable robustness.

Problem

Research questions and friction points this paper is trying to address.

Certified Robustness

Adversarial Perturbations

Gaussian Mixture

Latent Space

KL Divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

certified robustness

Gaussian mixture model

latent space