🤖 AI Summary
This work addresses the limitation of traditional data valuation methods, which overlook the time-varying nature of sample contributions in time series and thus fail to accurately assess their true utility. To overcome this, the authors propose a time-aware temporal Shapley valuation framework that incorporates exponential and power-law decay mechanisms to model temporal dynamics. By integrating multi-scale parallel valuation with a sample-level adaptive fusion strategy, the approach effectively relaxes the independent and identically distributed (i.i.d.) assumption commonly imposed in conventional methods. Empirical evaluations demonstrate that the proposed framework significantly outperforms existing techniques in tasks such as noise detection and identification of high-value data, particularly in scenarios characterized by strong temporal dependencies, thereby achieving enhanced valuation accuracy and robustness.
📝 Abstract
With the rapid development of machine learning applications on time-series data, accurately assessing the value of training samples has become essential for data selection, noise detection, and model optimization. However, traditional data valuation methods usually assume that samples are independent and identically distributed, and thus ignore the time-varying nature of sample value in time-series data. This paper proposes an improved temporal Shapley data valuation method that enables accurate sample valuation for time-series data through a temporal decay mechanism and a multi-scale fusion strategy. Specifically, we propose three progressively enhanced temporal Shapley methods. Temporal-Decay Shapley (TDS) incorporates temporal information into Shapley value computation through exponential decay weights; the improved TDS adopts power exponential decay to better adapt to nonlinear temporal drift; and Multi-Scale Temporal-Decay Shapley (MS-TDS) constructs a multi-scale fusion mechanism that balances the value of short-term hotspot samples and long-term foundational samples through parallel multi-scale valuation and sample-level adaptive fusion. Experimental results show that the proposed methods generally outperform traditional methods in noise detection and high-value data identification tasks, with more evident advantages under most strongly temporal settings, thereby effectively improving the accuracy and robustness of data valuation.