🤖 AI Summary
This work addresses the challenge of precisely verifying softmax-based attention mechanisms in Transformers under input interval constraints, where existing approaches suffer from overly conservative errors due to independent relaxations. The authors introduce Vertex-Softmax, a novel primitive that establishes—for the first time—that the optimal solution of softmax over a bounded input box always occurs at a vertex. Leveraging this insight, they construct a threshold structure based on sorted objective coefficients, yielding only a linear number of candidate solutions and enabling the tightest sound upper bound using solely score intervals. Integrated with vertex optimization, the threshold structure theorem, and a CROWN-style convex relaxation framework, the method provides formal correctness guarantees. Experiments on MNIST, Fashion-MNIST, and CIFAR-10 demonstrate significantly improved certified accuracy, tighter lower bounds, and superior or comparable performance to alpha-CROWN and branch-and-bound baselines at lower computational cost.
📝 Abstract
Certified verification of transformer attention requires bounding the softmax function over interval constraints on the pre-softmax scores. Existing verifiers relax softmax ndependently of the downstream objective, leaving avoidable slack. We prove that the exact optimum of this score-box problem is attained at a vertex of the constraint box, and establish a threshold structure theorem showing that, after sorting the objective coefficients, the optimum lies among only linearly many candidates, yielding the Vertex-Softmax primitive with log-linear complexity in the sequence length. We further prove a formal optimality result showing that Vertex-Softmax is the tightest sound bound obtainable from score intervals alone, characterizing precisely what additional structure (score correlations, score-value coupling) is needed for further improvement. Integrated into a CROWN Convex Relaxation based Optimization for Worst-case Neurons)-style verifier with a formal soundness guarantee, Vertex-Softmax significantly improves certified rates and substantially tightens lower bounds across MNIST, Fashion-MNIST, and CIFAR-10 attention models, while consistently matching or outperforming alpha-CROWN and branch-and-bound baselines at a fraction of their cost.