Probabilistic Stability Guarantees for Feature Attributions

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing methods for evaluating feature attribution stability rely on smoothed classifiers, yielding conservative and hard-to-interpret probabilistic guarantees. Method: We propose “soft stability”, a novel definition that transcends the conventional Boolean stability paradigm; design SCA—a model-agnostic, sample-efficient stability certification algorithm—grounded in probabilistic smoothing and Boolean function analysis, enabling an adjustable trade-off between stability and attribution accuracy. Contribution/Results: SCA delivers the first non-conservative, interpretable probabilistic guarantees without strong assumptions. Experiments on vision and language tasks demonstrate that SCA substantially improves the practicality, accuracy, and interpretability of stability certification. Moreover, the soft stability metric more faithfully reflects attribution robustness compared to binary alternatives.

Technology Category

Application Category

📝 Abstract

Stability guarantees are an emerging tool for evaluating feature attributions, but existing certification methods rely on smoothed classifiers and often yield conservative guarantees. To address these limitations, we introduce soft stability and propose a simple, model-agnostic, and sample-efficient stability certification algorithm (SCA) that provides non-trivial and interpretable guarantees for any attribution. Moreover, we show that mild smoothing enables a graceful tradeoff between accuracy and stability, in contrast to prior certification methods that require a more aggressive compromise. Using Boolean function analysis, we give a novel characterization of stability under smoothing. We evaluate SCA on vision and language tasks, and demonstrate the effectiveness of soft stability in measuring the robustness of explanation methods.

Problem

Research questions and friction points this paper is trying to address.

Provides non-trivial guarantees for feature attributions

Enables accuracy-stability tradeoff with mild smoothing

Measures robustness of explanation methods effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces soft stability for feature attributions

Proposes model-agnostic stability certification algorithm

Uses Boolean analysis for stability characterization

🔎 Similar Papers

Statistical Significance of Feature Importance Rankings