🤖 AI Summary
Traditional influence functions rely on the low-dimensional assumption (parameters ≪ samples), leading to substantial accuracy degradation in modern high-dimensional AI models and limiting their interpretability. This work identifies the root cause of this failure and proposes Newfluence—the first influence estimation method specifically designed for high-dimensional settings. Grounded in high-dimensional and robust statistics, we establish a rigorous mathematical framework and introduce a corrected asymptotic approximation technique that overcomes the intrinsic dimensionality constraints of classical influence functions. Newfluence achieves computational efficiency comparable to standard methods while providing theoretically guaranteed higher-order accuracy. Empirical evaluations demonstrate its superior sensitivity to outliers, noisy labels, and model misspecification—outperforming both conventional influence functions and Shapley values. Newfluence establishes a new benchmark for large-model diagnostics, trustworthy interpretation, and robust optimization.
📝 Abstract
The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions. Influence functions, originating from robust statistics, have emerged as a popular approach for this purpose.
However, the heuristic foundations of influence functions rely on low-dimensional assumptions where the number of parameters $p$ is much smaller than the number of observations $n$. In contrast, modern AI models often operate in high-dimensional regimes with large $p$, challenging these assumptions.
In this paper, we examine the accuracy of influence functions in high-dimensional settings. Our theoretical and empirical analyses reveal that influence functions cannot reliably fulfill their intended purpose. We then introduce an alternative approximation, called Newfluence, that maintains similar computational efficiency while offering significantly improved accuracy.
Newfluence is expected to provide more accurate insights than many existing methods for interpreting complex AI models and diagnosing their issues. Moreover, the high-dimensional framework we develop in this paper can also be applied to analyze other popular techniques, such as Shapley values.