Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Data attribution methods exhibit poor robustness under distributional shift, undermining their practical reliability. This paper introduces the first certified robust attribution framework grounded in a natural Wasserstein metric, applicable uniformly to both convex models and deep neural networks. Key contributions include: (1) defining a natural Wasserstein metric that eliminates spectral amplification effects in representation space; (2) deriving the first nontrivial Lipschitz certification bound for neural network attributions; and (3) establishing that Self-Influence—the empirical estimate of the attribution’s Lipschitz constant—provides a theoretically grounded foundation for anomaly detection. Experiments on CIFAR-10 with ResNet-18 show that our method achieves a 68.7% certified ranking accuracy (versus 0% for baselines), while Self-Influence attains an AUROC of 0.970 for label-noise detection and identifies 94.1% of mislabeled samples within the top 20% ranked instances.

Technology Category

Application Category

📝 Abstract

Data attribution methods identify which training examples are responsible for a model's predictions, but their sensitivity to distributional perturbations undermines practical reliability. We present a unified framework for certified robust attribution that extends from convex models to deep networks. For convex settings, we derive Wasserstein-Robust Influence Functions (W-RIF) with provable coverage guarantees. For deep networks, we demonstrate that Euclidean certification is rendered vacuous by spectral amplification -- a mechanism where the inherent ill-conditioning of deep representations inflates Lipschitz bounds by over $10{,}000 imes$. This explains why standard TRAK scores, while accurate point estimates, are geometrically fragile: naive Euclidean robustness analysis yields 0% certification. Our key contribution is the Natural Wasserstein metric, which measures perturbations in the geometry induced by the model's own feature covariance. This eliminates spectral amplification, reducing worst-case sensitivity by $76 imes$ and stabilizing attribution estimates. On CIFAR-10 with ResNet-18, Natural W-TRAK certifies 68.7% of ranking pairs compared to 0% for Euclidean baselines -- to our knowledge, the first non-vacuous certified bounds for neural network attribution. Furthermore, we prove that the Self-Influence term arising from our analysis equals the Lipschitz constant governing attribution stability, providing theoretical grounding for leverage-based anomaly detection. Empirically, Self-Influence achieves 0.970 AUROC for label noise detection, identifying 94.1% of corrupted labels by examining just the top 20% of training data.

Problem

Research questions and friction points this paper is trying to address.

Develop certified robust attribution methods for convex and deep models

Address geometric fragility of attribution due to spectral amplification

Propose Natural Wasserstein metric to stabilize attribution estimates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Natural Wasserstein metric eliminates spectral amplification

Wasserstein-Robust Influence Functions provide certified guarantees

Self-Influence term enables robust anomaly detection

🔎 Similar Papers

No similar papers found.