Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Transformer-based models for multivariate time series forecasting typically capture either temporal or channel-wise dependencies in isolation, failing to jointly model both and thereby limiting predictive performance. To address this, we propose the Dual-Path Transformer (DPT), which introduces a novel multi-patch attention mechanism—replacing standard multi-head attention—and a dual-path architecture: one path employs causal temporal attention to model dynamic time dependencies, while the other applies channel attention to capture cross-variable correlations. DPT adopts an end-to-end Transformer design, integrating multi-patch input encoding with a causal decoder. Evaluated on multiple standard benchmarks, DPT achieves state-of-the-art or top-tier performance, significantly improving both accuracy and robustness in long-horizon multivariate forecasting.

Technology Category

Application Category

📝 Abstract
Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data. Most of the state-of-the-art architectures exploit either temporal or inter-channel dependencies, limiting their effectiveness in multivariate time-series forecasting where both types of dependencies are crucial. We propose Sentinel, a full transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension, and a decoder designed to capture causal relations and dependencies across the temporal dimension. Additionally, we introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture, replacing the multi-head splitting process. Extensive experiments on standard benchmarks demonstrate that Sentinel, because of its ability to"monitor"both the temporal and the inter-channel dimension, achieves better or comparable performance with respect to state-of-the-art approaches.
Problem

Research questions and friction points this paper is trying to address.

Improves multivariate time-series forecasting by modeling temporal and inter-channel dependencies
Introduces multi-patch attention to replace multi-head splitting in transformers
Enhances transformer architecture with channel-aware encoder and temporal-aware decoder
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer with temporal and channel attention
Multi-patch attention mechanism replaces multi-head
Encoder-decoder captures channel and temporal dependencies