🤖 AI Summary
Neural network formal verification often suffers from overly fine-grained specifications that hinder scalability and interpretability.
Method: This paper introduces the *Minimal Neural Activation Pattern (NAP)* as a generalizable robustness specification—defined as the smallest subset of neurons whose activation suffices to guarantee model robustness. We systematically formulate and solve the minimal NAP problem, proposing three novel approaches: conservative (sound but incomplete), statistical (based on significance testing), and optimistic (approximate, verification-free). The optimistic method efficiently uncovers potential causal relationships between neuron activations and robustness in large-scale vision models without invoking formal verifiers, integrating neural activation analysis, heuristic pruning, statistical significance testing, and optimistic approximate reasoning.
Contribution/Results: Experiments demonstrate that minimal NAPs require only ~1% of the neurons used by baseline methods, while improving verified robustness bounds by several orders of magnitude—substantially enhancing both verifiability and mechanistic interpretability of neural robustness.
📝 Abstract
Formal verification is only as good as the specification of a system, which is also true for neural network verification. Existing specifications follow the paradigm of data as specification, where the local neighborhood around a reference data point is considered correct or robust. While these specifications provide a fair testbed for assessing model robustness, they are too restrictive for verifying any unseen test data points, a challenging task with significant real-world implications. Recent work shows great promise through a new paradigm, neural representation as specification, which uses neural activation patterns (NAPs) for this purpose. However, it computes the most refined NAPs, which include many redundant neurons. In this paper, we study the following problem: Given a neural network, find a minimal (general) NAP specification that is sufficient for formal verification of its robustness properties. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which set of neurons contributes to the model's robustness. To address this problem, we propose three approaches: conservative, statistical, and optimistic. Each of these methods offers distinct strengths and trade-offs in terms of minimality and computational speed, making them suitable for scenarios with different priorities. Notably, the optimistic approach can probe potential causal links between neurons and the robustness of large vision neural networks without relying on verification tools, a task existing methods struggle to scale. Our experiments show that minimal NAP specifications use far fewer neurons than those from previous work while expanding verifiable boundaries by several orders of magnitude.