CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly modeling local temporal patterns and capturing long-range dependencies—while maintaining interpretability—in multivariate time series forecasting, this paper proposes CNN-TFT-SHAP-MHAW: a hybrid architecture integrating 1D convolutional neural networks (for local feature extraction) and the Temporal Fusion Transformer (TFT) (for global dynamic dependency modeling), augmented by a novel Multi-Head Attention-Weighted SHAP (MHAW-SHAP) method enabling fine-grained feature attribution. Evaluated on hydropower flow forecasting, the model achieves a mean absolute percentage error (MAPE) of 2.2%, substantially outperforming state-of-the-art models including LSTM, TCN, and Informer. Key contributions include: (1) a synergistic CNN-TFT modeling framework; (2) an attention-guided interpretability enhancement mechanism; and (3) a unified high-accuracy–high-interpretabibility solution tailored for industrial time-series applications.

Technology Category

Application Category

📝 Abstract
Convolutional neural networks (CNNs) and transformer architectures offer strengths for modeling temporal data: CNNs excel at capturing local patterns and translational invariances, while transformers effectively model long-range dependencies via self-attention. This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer (TFT) backbone to enhance multivariate time series forecasting. The CNN module first applies a hierarchy of one-dimensional convolutional layers to distill salient local patterns from raw input sequences, reducing noise and dimensionality. The resulting feature maps are then fed into the TFT, which applies multi-head attention to capture both short- and long-term dependencies and to weigh relevant covariates adaptively. We evaluate the CNN-TFT on a hydroelectric natural flow time series dataset. Experimental results demonstrate that CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%. The explainability of the model is obtained by a proposed Shapley additive explanations with multi-head attention weights (SHAP-MHAW). Our novel architecture, named CNN-TFT-SHAP-MHAW, is promising for applications requiring high-fidelity, multivariate time series forecasts, being available for future analysis at https://github.com/SFStefenon/CNN-TFT-SHAP-MHAW .
Problem

Research questions and friction points this paper is trying to address.

Enhancing multivariate time series forecasting accuracy
Integrating convolutional networks with transformer architectures
Providing explainable predictions using SHAP with attention weights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid CNN and temporal fusion transformer architecture
Multi-head attention captures short and long-term dependencies
SHAP explainability with multi-head attention weights
🔎 Similar Papers
No similar papers found.
S
Stefano F. Stefenon
Lisbon School of Engineering (ISEL), Polytechnic University of Lisbon, Lisbon 1959-007, Portugal, and Faculty of Engineering and Applied Sciences, University of Regina, Saskatchewan, S4S 0A2, Canada
João P. Matos-Carvalho
João P. Matos-Carvalho
LASIGE, Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal
Computer VisionArtificial IntelligenceDeep LearningImage ProcessingUAVs
V
Valderi R. Q. Leithardt
Instituto Universitário de Lisboa, (ISCTE-IUL), ISTAR, 1649-026, Lisboa, Portugal
K
Kin-Choong Yow
Faculty of Engineering and Applied Sciences, University of Regina, Saskatchewan, S4S 0A2, Canada