🤖 AI Summary
Quantifying the contribution of individual time points in time-series data remains challenging, as existing attribution methods typically assume i.i.d. observations and thus fail to jointly preserve temporal dependencies and ensure interpretable, point-level attributions.
Method: This paper introduces the first influence-function-based framework for non-i.i.d. time-series settings—Temporal-aware Influence Function (TIF)—which integrates temporal embedding constraints, sliding-window gradient propagation, and a time-sensitive Hessian-vector product for second-order approximation.
Contribution/Results: TIF rigorously preserves temporal structure while enabling precise, time-point-level attribution. It achieves state-of-the-art performance on multi-source time-series forecasting tasks; accurately identifies harmful anomalies and critical supportive points; and generates intuitive, interpretable attribution heatmaps that facilitate visual identification of anomalous patterns.
📝 Abstract
Evaluating the contribution of individual data points to a model's prediction is critical for interpreting model predictions and improving model performance. Existing data contribution methods have been applied to various data types, including tabular data, images, and texts; however, their primary focus has been on i.i.d. settings. Despite the pressing need for principled approaches tailored to time series datasets, the problem of estimating data contribution in such settings remains unexplored, possibly due to challenges associated with handling inherent temporal dependencies. This paper introduces TimeInf, a data contribution estimation method for time-series datasets. TimeInf uses influence functions to attribute model predictions to individual time points while preserving temporal structures. Our extensive empirical results demonstrate that TimeInf outperforms state-of-the-art methods in identifying harmful anomalies and helpful time points for forecasting. Additionally, TimeInf offers intuitive and interpretable attributions of data values, allowing us to easily distinguish diverse anomaly patterns through visualizations.