π€ AI Summary
This work addresses the high computational cost of traditional Shapley value computation, which struggles to adapt to dynamic changes in training data and tasks and lacks mechanisms for reusing prior results. The authors propose D-Shap, a novel framework that formulates dynamic Shapley valuation as a playerβtask matrix maintenance problem. D-Shap introduces a self-evaluation mechanism to construct an initial matrix directly from training data without requiring predefined evaluation tasks. It enables efficient dynamic updates through structure-aware interpolation, localized block updates, scalable subset reuse, and coverage-aware anchor selection. Experiments demonstrate that D-Shap achieves millisecond-level task updates across diverse models, reduces player update costs by three orders of magnitude, and maintains valuation accuracy comparable to full recomputation.
π Abstract
Shapley-based data valuation provides a principled way to quantify the contribution of training data, but its high computational cost makes it impractical in dynamic settings where tasks and training players evolve. Existing methods treat Shapley computation as a one-shot process and collapse contributions into aggregated scores, preventing reuse and requiring recomputation under any change. We introduce a new perspective that represents Shapley values as a player-by-task matrix and formulates dynamic valuation as a structured matrix maintenance problem. We exploit the fact that each task depends on a small subset of training players and that similar tasks yield similar valuations, leading to utility locality and coalition locality. Based on these insights, we propose D-Shap, a dynamic valuation framework that enables efficient updates by modifying only a small portion of the matrix: new task valuations are inferred via structure-aware interpolation, while updates induced by new players are confined to affected local matrix blocks. To eliminate the need for pre-specified evaluation tasks, we introduce self-valuation, which constructs the initial matrix directly from training data, supported by scalable subset reuse and coverage-aware anchor selection. Experiments across diverse models show that D-Shap performs task updates in milliseconds and reduces the cost of player updates by up to three orders of magnitude, while achieving valuation quality competitive with full recomputation.