Delta-Audit: Explaining What Changes When Models Change

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Model updates—such as hyperparameter tuning, architectural modifications, or data shifts—frequently induce behavioral changes, yet the underlying causes remain poorly understood. This paper introduces Delta-Attribution, the first model-agnostic framework for attribution difference analysis: it quantifies discrepancies in feature attributions between model versions A and B to pinpoint factors driving behavioral shifts. Our method employs efficient masking/clamping in normalized space, class-anchored marginal attribution, and baseline-averaging to ensure robustness and fidelity. We further propose a multi-dimensional evaluation protocol that discriminates substantive bias-induced attribution migration from superficial parameter adjustments. Evaluated across 45 diverse update scenarios, Delta-Attribution accurately identifies bias-driven, behaviorally aligned attribution shifts while remaining invariant to “decorative” changes—e.g., minor hyperparameter tweaks or non-functional architectural refinements. The framework thus provides an interpretable, diagnostic tool for tracing attribution evolution during model development and deployment.

Technology Category

Application Category

📝 Abstract

Model updates (new hyperparameters, kernels, depths, solvers, or data) change performance, but the emph{reason} often remains opaque. We introduce extbf{Delta-Attribution} (mbox{$Δ$-Attribution}), a model-agnostic framework that explains emph{what changed} between versions $A$ and $B$ by differencing per-feature attributions: $Δφ(x)=φ_B(x)-φ_A(x)$. We evaluate $Δφ$ with a emph{$Δ$-Attribution Quality Suite} covering magnitude/sparsity (L1, Top-$k$, entropy), agreement/shift (rank-overlap@10, Jensen--Shannon divergence), behavioural alignment (Delta Conservation Error, DCE; Behaviour--Attribution Coupling, BAC; CO$Δ$F), and robustness (noise, baseline sensitivity, grouped occlusion). Instantiated via fast occlusion/clamping in standardized space with a class-anchored margin and baseline averaging, we audit 45 settings: five classical families (Logistic Regression, SVC, Random Forests, Gradient Boosting, $k$NN), three datasets (Breast Cancer, Wine, Digits), and three A/B pairs per family. extbf{Findings.} Inductive-bias changes yield large, behaviour-aligned deltas (e.g., SVC poly$! ightarrow$rbf on Breast Cancer: BAC$approx$0.998, DCE$approx$6.6; Random Forest feature-rule swap on Digits: BAC$approx$0.997, DCE$approx$7.5), while ``cosmetic'' tweaks (SVC exttt{gamma=scale} vs. exttt{auto}, $k$NN search) show rank-overlap@10$=1.0$ and DCE$approx$0. The largest redistribution appears for deeper GB on Breast Cancer (JSD$approx$0.357). $Δ$-Attribution offers a lightweight update audit that complements accuracy by distinguishing benign changes from behaviourally meaningful or risky reliance shifts.

Problem

Research questions and friction points this paper is trying to address.

Explaining why model performance changes after updates

Differencing per-feature attributions between model versions

Distinguishing benign changes from meaningful behavioral shifts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Delta-Attribution framework for model change analysis

Differencing per-feature attributions between model versions

Model-agnostic approach using standardized occlusion techniques

🔎 Similar Papers

No similar papers found.