T3Time: Tri-Modal Time Series Forecasting via Adaptive Multi-Head Alignment and Residual Fusion

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing multivariate time series forecasting methods often rely on fixed inductive biases, neglect inter-variable dependencies, or employ static fusion strategies, thereby struggling to capture dynamic, horizon-dependent temporal relationships. To address this, we propose a tri-modal forecasting framework that jointly models time-domain dynamics, frequency-domain characteristics, and task-specific prompt information. Specifically, we introduce a spectral branch with a gating mechanism for dual-stream feature collaboration; design an adaptive multi-head cross-modal alignment module coupled with residual fusion to dynamically modulate modality-specific weights across prediction horizons; and integrate frequency-aware positional encoding, prompt learning, and a lightweight Transformer architecture to enable effective few-shot training. Evaluated on multiple benchmark datasets, our method achieves average reductions of 3.28% in MSE and 2.29% in MAE. Notably, it maintains superior performance using only 5–10% of the full training data, demonstrating significantly enhanced generalization and long-horizon modeling capability.

Technology Category

Application Category

📝 Abstract

Multivariate time series forecasting (MTSF) seeks to model temporal dynamics among variables to predict future trends. Transformer-based models and large language models (LLMs) have shown promise due to their ability to capture long-range dependencies and patterns. However, current methods often rely on rigid inductive biases, ignore intervariable interactions, or apply static fusion strategies that limit adaptability across forecast horizons. These limitations create bottlenecks in capturing nuanced, horizon-specific relationships in time-series data. To solve this problem, we propose T3Time, a novel trimodal framework consisting of time, spectral, and prompt branches, where the dedicated frequency encoding branch captures the periodic structures along with a gating mechanism that learns prioritization between temporal and spectral features based on the prediction horizon. We also proposed a mechanism which adaptively aggregates multiple cross-modal alignment heads by dynamically weighting the importance of each head based on the features. Extensive experiments on benchmark datasets demonstrate that our model consistently outperforms state-of-the-art baselines, achieving an average reduction of 3.28% in MSE and 2.29% in MAE. Furthermore, it shows strong generalization in few-shot learning settings: with 5% training data, we see a reduction in MSE and MAE by 4.13% and 1.91%, respectively; and with 10% data, by 3.62% and 1.98% on average. Code - https://github.com/monaf-chowdhury/T3Time/

Problem

Research questions and friction points this paper is trying to address.

Captures nuanced horizon-specific relationships in time-series data

Overcomes rigid inductive biases and static fusion strategies

Improves adaptability across forecast horizons via dynamic feature weighting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-modal framework with time, spectral, prompt branches

Adaptive multi-head alignment via dynamic weighting

Gating mechanism prioritizes temporal and spectral features

🔎 Similar Papers

TS-HTFA: Advancing Time Series Forecasting via Hierarchical Text-Free Alignment with Large Language Models