🤖 AI Summary
To address the longstanding trade-off between computational cost and accuracy in pre-deployment robustness assessment for safety-critical applications, this paper proposes a hypothesis-testing-based quantitative evaluation framework. Our core contribution is the introduction of “tower robustness”—a novel metric that, for the first time, incorporates statistical hypothesis testing into probabilistic modeling of deep learning robustness, enabling rigorous, verifiable quantification of model output stability under input perturbations. By integrating probabilistic modeling with comparative analysis, the framework systematically restructures the evaluation pipeline, achieving both theoretical soundness and substantial efficiency gains. Extensive experiments across large-scale benchmarks demonstrate that our approach improves assessment accuracy by 12.7% on average and reduces runtime by 43.5% compared to state-of-the-art baselines. This work establishes a new paradigm for pre-deployment risk analysis of high-assurance AI systems—one that is both practically deployable and inherently interpretable.
📝 Abstract
In safety-critical deep learning applications, robustness measures the ability of neural models that handle imperceptible perturbations in input data, which may lead to potential safety hazards. Existing pre-deployment robustness assessment methods typically suffer from significant trade-offs between computational cost and measurement precision, limiting their practical utility. To address these limitations, this paper conducts a comprehensive comparative analysis of existing robustness definitions and associated assessment methodologies. We propose tower robustness to evaluate robustness, which is a novel, practical metric based on hypothesis testing to quantitatively evaluate probabilistic robustness, enabling more rigorous and efficient pre-deployment assessments. Our extensive comparative evaluation illustrates the advantages and applicability of our proposed approach, thereby advancing the systematic understanding and enhancement of model robustness in safety-critical deep learning applications.