🤖 AI Summary
This work addresses the prohibitive computational complexity of game-theoretic data valuation methods—such as the Banzhaf value—in $k$-nearest neighbor ($k$NN) classifiers, which has hindered their practical adoption. The paper establishes, for the first time, that computing the Banzhaf value in this setting is #P-hard. To overcome this barrier, the authors propose the first efficient exact algorithms for both weighted and unweighted $k$NN classifiers. By exploiting the locality inherent in $k$NN and integrating dynamic programming with Monte Carlo estimation, the algorithms achieve time complexities of $O(Wkn^2)$ and $O(nk^2)$, respectively. Empirical evaluations on real-world datasets demonstrate the scalability of the proposed approach, significantly advancing the feasibility of quantifying individual data contributions in large-scale applications.
📝 Abstract
Data valuation, the task of quantifying the contribution of individual data points to model performance, has emerged as a fundamental challenge in machine learning. Game-theoretic approaches, such as the Banzhaf value, offer principled frameworks for fair data valuation; however, they suffer from exponential computational complexity. We address this challenge by developing efficient algorithms specifically tailored for computing Banzhaf values in $k$-nearest neighbor ($k$NN) classifiers. We first establish the theoretical hardness of the problem by proving that it is \#P-hard. Despite this intractability, we exploit the locality properties of $k$NN classifiers to develop practical exact algorithms. Our main contribution is a dynamic programming framework that achieves significant computational improvements: we present a pseudo-polynomial algorithm with $O(Wkn^2)$ time complexity for weighted $k$NN classifiers, where $W$ is the maximum sum of top-$k$ weights, and a specialized algorithm for unweighted $k$NN that achieves $O(nk^2)$ time complexity, that is, linear in the number of data points. We also offer efficient Monte Carlo estimation methods. Extensive experiments on real-world datasets demonstrate the practical efficiency of our approach and its effectiveness in data valuation applications.