Exact Certification of Neural Networks and Partition Aggregation Ensembles against Label Poisoning

๐Ÿ“… 2026-04-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

197K/year
๐Ÿค– AI Summary
Existing methods struggle to provide efficient and tight robustness certificates against label-flipping attacks. This work proposes EnsembleCert, a novel framework that, for the first time, incorporates white-box information from base classifiers into partition-aggregation ensembles. By integrating the neural tangent kernel (NTK)-based ScaLabelCert approach, EnsembleCert enables precise robustness certification against label-flipping attacks on neural networks within polynomial time. The method substantially reduces the required number of partitions while significantly enhancing certification strength: on CIFAR-10, it achieves up to a 26.5% improvement in median certified robustness over current black-box methods and reduces the number of partitions by up to two orders of magnitude.

Technology Category

Application Category

๐Ÿ“ Abstract
Label-flipping attacks, which corrupt training labels to induce misclassifications at inference, remain a major threat to supervised learning models. This drives the need for robustness certificates that provide formal guarantees about a model's robustness under adversarially corrupted labels. Existing certification frameworks rely on ensemble techniques such as smoothing or partition-aggregation, but treat the corresponding base classifiers as black boxes, yielding overly conservative guarantees. We introduce EnsembleCert, the first certification framework for partition-aggregation ensembles that utilizes white-box knowledge of the base classifiers. Concretely, EnsembleCert yields tighter guarantees than black-box approaches by aggregating per-partition white-box certificates to compute ensemble-level guarantees in polynomial time. To extract white-box knowledge from the base classifiers efficiently, we develop ScaLabelCert, a method that leverages the equivalence between sufficiently wide neural networks and kernel methods using the neural tangent kernel. ScaLabelCert yields the first exact, polynomial-time calculable certificate for neural networks against label-flipping attacks. EnsembleCert is either on par, or significantly outperforms the existing partition-based black box certificates. Exemplary, on CIFAR-10, our method can certify upto +26.5% more label flips in median over the test set compared to the existing black-box approach while requiring 100 times fewer partitions, thus, challenging the prevailing notion that heavy partitioning is a necessity for strong certified robustness.
Problem

Research questions and friction points this paper is trying to address.

label-flipping attacks
robustness certification
partition-aggregation ensembles
neural networks
white-box certification
Innovation

Methods, ideas, or system contributions that make the work stand out.

white-box certification
partition-aggregation ensemble
label-flipping attack
neural tangent kernel
exact robustness certificate
๐Ÿ”Ž Similar Papers
No similar papers found.