Learning Better Certified Models from Empirically-Robust Teachers

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work proposes a novel feature-space knowledge distillation framework to address the significant trade-off between certified robustness and standard accuracy in neural networks. While existing certified training methods often substantially degrade standard performance, the proposed approach uniquely integrates an empirically robust adversarially trained teacher model with relaxation-based certified training to guide a ReLU student network in learning feature representations that simultaneously achieve high accuracy and strong certified robustness. By leveraging the teacher’s robust features within a certified training paradigm, the method effectively overcomes the standard accuracy bottleneck inherent in conventional certified defenses. Extensive experiments on multiple robust vision benchmarks demonstrate that the approach substantially outperforms current state-of-the-art methods, achieving an optimal balance between certified robustness and standard accuracy.

Technology Category

Application Category

📝 Abstract

Adversarial training attains strong empirical robustness to specific adversarial attacks by training on concrete adversarial perturbations, but it produces neural networks that are not amenable to strong robustness certificates through neural network verification. On the other hand, earlier certified training schemes directly train on bounds from network relaxations to obtain models that are certifiably robust, but display sub-par standard performance. Recent work has shown that state-of-the-art trade-offs between certified robustness and standard performance can be obtained through a family of losses combining adversarial outputs and neural network bounds. Nevertheless, differently from empirical robustness, verifiability still comes at a significant cost in standard performance. In this work, we propose to leverage empirically-robust teachers to improve the performance of certifiably-robust models through knowledge distillation. Using a versatile feature-space distillation objective, we show that distillation from adversarially-trained teachers consistently improves on the state-of-the-art in certified training for ReLU networks across a series of robust computer vision benchmarks.

Problem

Research questions and friction points this paper is trying to address.

certified robustness

standard performance

adversarial training

neural network verification

knowledge distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

certified robustness

adversarial training

knowledge distillation