🤖 AI Summary
This work addresses theoretical limitations of influence functions (IF) and the single Newton step (NS) method in data attribution: existing analyses rely on unrealistic global strong convexity assumptions, and their error bounds deteriorate sharply with parameter dimension $d$ and number $k$ of removed samples, failing to characterize precise scaling laws. We establish the first asymptotically tight error bounds for IF and NS under non-strongly convex, high-dimensional sparse settings—without requiring global strong convexity. Our theory reveals that NS generally outperforms IF, and we rigorously derive its deviation from the exact retraining parameter as $widetilde{Theta}(kd/n^2)$, versus $widetilde{Theta}((k+d)sqrt{kd}/n^2)$ for IF. Leveraging local curvature characterization, expectation analysis over random subsets, and high-dimensional asymptotic tools, we further prove the average-case optimality of NS.
📝 Abstract
Data attribution aims to explain model predictions by estimating how they would change if certain training points were removed, and is used in a wide range of applications, from interpretability and credit assignment to unlearning and privacy.
Even in the relatively simple case of linear regressions, existing mathematical analyses of leading data attribution methods such as Influence Functions (IF) and single Newton Step (NS) remain limited in two key ways. First, they rely on global strong convexity assumptions which are often not satisfied in practice. Second, the resulting bounds scale very poorly with the number of parameters ($d$) and the number of samples removed ($k$). As a result, these analyses are not tight enough to answer fundamental questions such as "what is the asymptotic scaling of the errors of each method?" or "which of these methods is more accurate for a given dataset?"
In this paper, we introduce a new analysis of the NS and IF data attribution methods for convex learning problems. To the best of our knowledge, this is the first analysis of these questions that does not assume global strong convexity and also the first explanation of [KATL19] and [RH25a]'s observation that NS data attribution is often more accurate than IF. We prove that for sufficiently well-behaved logistic regression, our bounds are asymptotically tight up to poly-logarithmic factors, yielding scaling laws for the errors in the average-case sample removals.
[ mathbb{E}_{T subseteq [n],, |T| = k} igl[ |hatθ_T - hatθ_T^{mathrm{NS}}|_2 igr] = widetildeΘ!left(frac{k d}{n^2}
ight), qquad mathbb{E}_{T subseteq [n],, |T| = k} igl[ |hatθ_T^{mathrm{NS}} - hatθ_T^{mathrm{IF}}|_2 igr] = widetildeΘ!left( frac{(k + d)sqrt{k d}}{n^2}
ight). ]