Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work investigates the impact of activation function curvature—quantified by the maximum absolute second derivative $\max|\sigma''|$—on adversarial robustness. To this end, the authors introduce RCT-AF, a family of recursively defined activation functions with tunable curvature, and conduct a systematic analysis across diverse network architectures, datasets, and adversarial training settings. They reveal, for the first time, a universal non-monotonic relationship between $\max|\sigma''|$ and adversarial robustness: optimal robustness is consistently achieved when $\max|\sigma''| \in [4,10]$. Furthermore, they observe a U-shaped trend in the diagonal norm of the loss Hessian, with its minimum coinciding with peak robust performance, thereby establishing a theoretical link between activation curvature and the geometry of optimization.

Technology Category

Application Category

📝 Abstract

This work investigates the critical role of activation function curvature -- quantified by the maximum second derivative $\max|σ''|$ -- in adversarial robustness. Using the Recursive Curvature-Tunable Activation Family (RCT-AF), which enables precise control over curvature through parameters $α$ and $β$, we systematically analyze this relationship. Our study reveals a fundamental trade-off: insufficient curvature limits model expressivity, while excessive curvature amplifies the normalized Hessian diagonal norm of the loss, leading to sharper minima that hinder robust generalization. This results in a non-monotonic relationship where optimal adversarial robustness consistently occurs when $\max|σ''|$ falls within 4 to 10, a finding that holds across diverse network architectures, datasets, and adversarial training methods. We provide theoretical insights into how activation curvature affects the diagonal elements of the hessian matrix of the loss, and experimentally demonstrate that the normalized Hessian diagonal norm exhibits a U-shaped dependence on $\max|σ''|$, with its minimum within the optimal robustness range, thereby validating the proposed mechanism.

Problem

Research questions and friction points this paper is trying to address.

adversarial robustness

activation curvature

second derivative

Hessian diagonal norm

robust generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

activation curvature

adversarial robustness

second derivative