Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision Models (LVMs) exhibit inductive bias toward prediction horizons when applied to Long-Term Time Series Forecasting (LTSF), limiting their effectiveness. Method: This paper proposes the Dual-Modal View Adaptation framework (DMMV), introducing— for the first time—the adaptive trend-seasonal decomposition mechanism guided by reverse-prediction residuals. DMMV simultaneously encodes time series into synchronized image and text modalities, leveraging LVMs’ strong cross-modal representation capabilities to capture complementary temporal patterns. It enables the first end-to-end fine-tuning of LVMs for LTSF. Contribution/Results: Evaluated on eight benchmark datasets, DMMV consistently outperforms 14 state-of-the-art methods. It achieves the lowest Mean Squared Error (MSE) on six datasets, demonstrating significant improvements in both long-horizon forecasting accuracy and robustness.

Technology Category

Application Category

📝 Abstract
Time series, typically represented as numerical sequences, can also be transformed into images and texts, offering multi-modal views (MMVs) of the same underlying signal. These MMVs can reveal complementary patterns and enable the use of powerful pre-trained large models, such as large vision models (LVMs), for long-term time series forecasting (LTSF). However, as we identified in this work, applying LVMs to LTSF poses an inductive bias towards"forecasting periods". To harness this bias, we propose DMMV, a novel decomposition-based multi-modal view framework that leverages trend-seasonal decomposition and a novel backcast residual based adaptive decomposition to integrate MMVs for LTSF. Comparative evaluations against 14 state-of-the-art (SOTA) models across diverse datasets show that DMMV outperforms single-view and existing multi-modal baselines, achieving the best mean squared error (MSE) on 6 out of 8 benchmark datasets.
Problem

Research questions and friction points this paper is trying to address.

Applying large vision models to long-term time series forecasting introduces inductive bias
Existing multi-modal views lack integration for complementary pattern extraction
Current methods underutilize trend-seasonal decomposition in multi-modal forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal views enhance time series forecasting
Decomposition-based framework integrates trend-seasonal patterns
Backcast residual adaptive decomposition improves accuracy
🔎 Similar Papers
No similar papers found.