🤖 AI Summary
Existing provably faithful explanation methods provide formal guarantees but suffer from high computational overhead and poor scalability. To address this, we propose an abstraction-refinement framework: first, structurally abstract the original neural network to rapidly generate sufficient explanations; then, iteratively refine the abstract model on-demand using neural network verification techniques, ensuring that explanations remain provably valid on the original network. This work is the first to introduce model abstraction into provable explanation generation, achieving a principled trade-off between efficiency and formal soundness. The framework supports multi-granularity attribution analysis, revealing the robustness of predictions across abstraction levels. Experimental evaluation demonstrates speedups of several-fold to over an order of magnitude in computing provably sufficient explanations, while maintaining 100% verification accuracy across multiple benchmarks.
📝 Abstract
Despite significant advancements in post-hoc explainability techniques for neural networks, many current methods rely on heuristics and do not provide formally provable guarantees over the explanations provided. Recent work has shown that it is possible to obtain explanations with formal guarantees by identifying subsets of input features that are sufficient to determine that predictions remain unchanged using neural network verification techniques. Despite the appeal of these explanations, their computation faces significant scalability challenges. In this work, we address this gap by proposing a novel abstraction-refinement technique for efficiently computing provably sufficient explanations of neural network predictions. Our method abstracts the original large neural network by constructing a substantially reduced network, where a sufficient explanation of the reduced network is also provably sufficient for the original network, hence significantly speeding up the verification process. If the explanation is in sufficient on the reduced network, we iteratively refine the network size by gradually increasing it until convergence. Our experiments demonstrate that our approach enhances the efficiency of obtaining provably sufficient explanations for neural network predictions while additionally providing a fine-grained interpretation of the network's predictions across different abstraction levels.