Certified but Fooled! Breaking Certified Defences with Ghost Certificates

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work exposes a critical security vulnerability in probabilistic robustness certification mechanisms—such as randomized smoothing—where adversaries can induce certifiers to issue spurious, large robustness radii for incorrect classes via imperceptible input perturbations. To exploit this flaw, we propose a region-focused adversarial example generation method that precisely manipulates the certification process while preserving semantic content, thereby inflating the certified radius of a target class far beyond that of the true (source) class. Our approach achieves the first end-to-end bypass of state-of-the-art certified defenses—including DensePure—on ImageNet, successfully causing models to issue high-confidence yet entirely erroneous robustness certificates for adversarial inputs. Experimental results demonstrate that existing certification frameworks harbor fundamental gaps in soundness: their theoretical safety guarantees can be systematically violated in practice, undermining trust in real-world deployments.

Technology Category

Application Category

📝 Abstract

Certified defenses promise provable robustness guarantees. We study the malicious exploitation of probabilistic certification frameworks to better understand the limits of guarantee provisions. Now, the objective is to not only mislead a classifier, but also manipulate the certification process to generate a robustness guarantee for an adversarial input certificate spoofing. A recent study in ICLR demonstrated that crafting large perturbations can shift inputs far into regions capable of generating a certificate for an incorrect class. Our study investigates if perturbations needed to cause a misclassification and yet coax a certified model into issuing a deceptive, large robustness radius for a target class can still be made small and imperceptible. We explore the idea of region-focused adversarial examples to craft imperceptible perturbations, spoof certificates and achieve certification radii larger than the source class ghost certificates. Extensive evaluations with the ImageNet demonstrate the ability to effectively bypass state-of-the-art certified defenses such as Densepure. Our work underscores the need to better understand the limits of robustness certification methods.

Problem

Research questions and friction points this paper is trying to address.

Breaking certified defenses with ghost certificates

Crafting imperceptible perturbations to spoof robustness guarantees

Manipulating certification process to bypass provable security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploit probabilistic certification frameworks for adversarial manipulation

Craft imperceptible perturbations to spoof robustness certificates

Bypass state-of-the-art certified defenses with ghost certificates

🔎 Similar Papers

No similar papers found.