🤖 AI Summary
Model updates—such as hyperparameter tuning, architectural modifications, or data shifts—frequently induce behavioral changes, yet the underlying causes remain poorly understood. This paper introduces Delta-Attribution, the first model-agnostic framework for attribution difference analysis: it quantifies discrepancies in feature attributions between model versions A and B to pinpoint factors driving behavioral shifts. Our method employs efficient masking/clamping in normalized space, class-anchored marginal attribution, and baseline-averaging to ensure robustness and fidelity. We further propose a multi-dimensional evaluation protocol that discriminates substantive bias-induced attribution migration from superficial parameter adjustments. Evaluated across 45 diverse update scenarios, Delta-Attribution accurately identifies bias-driven, behaviorally aligned attribution shifts while remaining invariant to “decorative” changes—e.g., minor hyperparameter tweaks or non-functional architectural refinements. The framework thus provides an interpretable, diagnostic tool for tracing attribution evolution during model development and deployment.
📝 Abstract
Model updates (new hyperparameters, kernels, depths, solvers, or data) change performance, but the emph{reason} often remains opaque. We introduce extbf{Delta-Attribution} (mbox{$Δ$-Attribution}), a model-agnostic framework that explains emph{what changed} between versions $A$ and $B$ by differencing per-feature attributions: $Δφ(x)=φ_B(x)-φ_A(x)$. We evaluate $Δφ$ with a emph{$Δ$-Attribution Quality Suite} covering magnitude/sparsity (L1, Top-$k$, entropy), agreement/shift (rank-overlap@10, Jensen--Shannon divergence), behavioural alignment (Delta Conservation Error, DCE; Behaviour--Attribution Coupling, BAC; CO$Δ$F), and robustness (noise, baseline sensitivity, grouped occlusion).
Instantiated via fast occlusion/clamping in standardized space with a class-anchored margin and baseline averaging, we audit 45 settings: five classical families (Logistic Regression, SVC, Random Forests, Gradient Boosting, $k$NN), three datasets (Breast Cancer, Wine, Digits), and three A/B pairs per family. extbf{Findings.} Inductive-bias changes yield large, behaviour-aligned deltas (e.g., SVC poly$!
ightarrow$rbf on Breast Cancer: BAC$approx$0.998, DCE$approx$6.6; Random Forest feature-rule swap on Digits: BAC$approx$0.997, DCE$approx$7.5), while ``cosmetic'' tweaks (SVC exttt{gamma=scale} vs. exttt{auto}, $k$NN search) show rank-overlap@10$=1.0$ and DCE$approx$0. The largest redistribution appears for deeper GB on Breast Cancer (JSD$approx$0.357). $Δ$-Attribution offers a lightweight update audit that complements accuracy by distinguishing benign changes from behaviourally meaningful or risky reliance shifts.