๐ค AI Summary
Existing methods struggle to provide efficient and tight robustness certificates against label-flipping attacks. This work proposes EnsembleCert, a novel framework that, for the first time, incorporates white-box information from base classifiers into partition-aggregation ensembles. By integrating the neural tangent kernel (NTK)-based ScaLabelCert approach, EnsembleCert enables precise robustness certification against label-flipping attacks on neural networks within polynomial time. The method substantially reduces the required number of partitions while significantly enhancing certification strength: on CIFAR-10, it achieves up to a 26.5% improvement in median certified robustness over current black-box methods and reduces the number of partitions by up to two orders of magnitude.
๐ Abstract
Label-flipping attacks, which corrupt training labels to induce misclassifications at inference, remain a major threat to supervised learning models. This drives the need for robustness certificates that provide formal guarantees about a model's robustness under adversarially corrupted labels. Existing certification frameworks rely on ensemble techniques such as smoothing or partition-aggregation, but treat the corresponding base classifiers as black boxes, yielding overly conservative guarantees. We introduce EnsembleCert, the first certification framework for partition-aggregation ensembles that utilizes white-box knowledge of the base classifiers. Concretely, EnsembleCert yields tighter guarantees than black-box approaches by aggregating per-partition white-box certificates to compute ensemble-level guarantees in polynomial time. To extract white-box knowledge from the base classifiers efficiently, we develop ScaLabelCert, a method that leverages the equivalence between sufficiently wide neural networks and kernel methods using the neural tangent kernel. ScaLabelCert yields the first exact, polynomial-time calculable certificate for neural networks against label-flipping attacks. EnsembleCert is either on par, or significantly outperforms the existing partition-based black box certificates. Exemplary, on CIFAR-10, our method can certify upto +26.5% more label flips in median over the test set compared to the existing black-box approach while requiring 100 times fewer partitions, thus, challenging the prevailing notion that heavy partitioning is a necessity for strong certified robustness.