🤖 AI Summary
Existing methods for evaluating feature attribution stability rely on smoothed classifiers, yielding conservative and hard-to-interpret probabilistic guarantees. Method: We propose “soft stability”, a novel definition that transcends the conventional Boolean stability paradigm; design SCA—a model-agnostic, sample-efficient stability certification algorithm—grounded in probabilistic smoothing and Boolean function analysis, enabling an adjustable trade-off between stability and attribution accuracy. Contribution/Results: SCA delivers the first non-conservative, interpretable probabilistic guarantees without strong assumptions. Experiments on vision and language tasks demonstrate that SCA substantially improves the practicality, accuracy, and interpretability of stability certification. Moreover, the soft stability metric more faithfully reflects attribution robustness compared to binary alternatives.
📝 Abstract
Stability guarantees are an emerging tool for evaluating feature attributions, but existing certification methods rely on smoothed classifiers and often yield conservative guarantees. To address these limitations, we introduce soft stability and propose a simple, model-agnostic, and sample-efficient stability certification algorithm (SCA) that provides non-trivial and interpretable guarantees for any attribution. Moreover, we show that mild smoothing enables a graceful tradeoff between accuracy and stability, in contrast to prior certification methods that require a more aggressive compromise. Using Boolean function analysis, we give a novel characterization of stability under smoothing. We evaluate SCA on vision and language tasks, and demonstrate the effectiveness of soft stability in measuring the robustness of explanation methods.