Don't Get Me Wrong: How to apply Deep Visual Interpretations to Time Series

📅 2022-03-14
🏛️ arXiv.org
📈 Citations: 6
Influential: 2
📄 PDF
🤖 AI Summary
Deep learning models for time series suffer from poor interpretability, and existing attribution methods yield inconsistent and unreliable results on sequential data. Method: This paper introduces the first systematic, orthogonal six-dimensional evaluation framework for time-series interpretability, covering both classification and segmentation tasks. It proposes six mutually exclusive metrics—categorized by gradient-, propagation-, and perturbation-based explanation methods—and conducts large-scale empirical evaluation across nine state-of-the-art attribution methods and multiple model architectures, using the UCR benchmark and real-world industrial time series. Contribution/Results: Results reveal no globally superior method; explanation quality is highly dependent on task type and model architecture, with L2 regularization and dropout significantly affecting interpretability. The framework confirms low inter-metric correlation, and we publicly release a reproducible toolkit and expert guidance for method selection—establishing a standardized evaluation paradigm for time-series explainability research.
📝 Abstract
The correct interpretation and understanding of deep learning models are essential in many applications. Explanatory visual interpretation approaches for image, and natural language processing allow domain experts to validate and understand almost any deep learning model. However, they fall short when generalizing to arbitrary time series, which is inherently less intuitive and more diverse. Whether a visualization explains valid reasoning or captures the actual features is difficult to judge. Hence, instead of blind trust, we need an objective evaluation to obtain trustworthy quality metrics. We propose a framework of six orthogonal metrics for gradient-, propagation- or perturbation-based post-hoc visual interpretation methods for time series classification and segmentation tasks. An experimental study includes popular neural network architectures for time series and nine visual interpretation methods. We evaluate the visual interpretation methods with diverse datasets from the UCR repository and a complex, real-world dataset and study the influence of standard regularization techniques during training. We show that none of the methods consistently outperforms others on all metrics, while some are sometimes ahead. Our insights and recommendations allow experts to choose suitable visualization techniques for the model and task.
Problem

Research questions and friction points this paper is trying to address.

Evaluating saliency methods for interpreting convolutional models on time series
Assessing reliability of visual interpretations for diverse time series data
Providing guidelines to select saliency methods for specific datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates nine saliency methods on time series
Uses five metrics to generate recommendations
Implements case study on tool-use time series
🔎 Similar Papers
No similar papers found.