CTBENCH: A Library and Benchmark for Certified Training

📅 2024-06-07

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 1

career value

168K/year

🤖 AI Summary

Existing certified training algorithms suffer from inconsistent evaluation protocols and suboptimal hyperparameter tuning, leading to incomparable performance claims and unreliable SOTA conclusions. Method: We introduce CTBENCH—the first unified benchmark for certified training—enabling fair, cross-algorithm evaluation of mainstream methods (e.g., IBP, CROWN-IBP, DeepPoly) under a standardized training pipeline, consistent ℓ∞/ℓ2 certification framework, and systematic hyperparameter optimization (grid search + Bayesian optimization). Contribution/Results: Our evaluation reveals that most recently proposed algorithms are substantially overestimated in prior work; after baseline enhancement, their relative improvements drop by over 40% on average. Crucially, all methods achieve significantly higher certified accuracy on CTBENCH than reported in their original papers. This work establishes a reproducible, extensible standard for evaluating certified training, redefining both the robustness training baseline and the SOTA landscape.

Technology Category

Application Category

📝 Abstract

Training certifiably robust neural networks is an important but challenging task. While many algorithms for (deterministic) certified training have been proposed, they are often evaluated on different training schedules, certification methods, and systematically under-tuned hyperparameters, making it difficult to compare their performance. To address this challenge, we introduce CTBENCH, a unified library and a high-quality benchmark for certified training that evaluates all algorithms under fair settings and systematically tuned hyperparameters. We show that (1) almost all algorithms in CTBENCH surpass the corresponding reported performance in literature in the magnitude of algorithmic improvements, thus establishing new state-of-the-art, and (2) the claimed advantage of recent algorithms drops significantly when we enhance the outdated baselines with a fair training schedule, a fair certification method and well-tuned hyperparameters. Based on CTBENCH, we provide new insights into the current state of certified training and suggest future research directions. We are confident that CTBENCH will serve as a benchmark and testbed for future research in certified training.

Problem

Research questions and friction points this paper is trying to address.

Neural Network Reliability

Method Evaluation

Standardization

Innovation

Methods, ideas, or system contributions that make the work stand out.

CTBench

Neural Network Training

Standardized Evaluation

🔎 Similar Papers

LangBiTe: A Platform for Testing Bias in Large Language Models