Multimodal Forecasting for Commodity Prices Using Spectrogram-Based and Time Series Representations

πŸ“… 2026-03-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of commodity price forecasting in multivariate time series, where complex cross-variable dependencies and interference from heterogeneous external factors hinder prediction accuracy. To tackle this, the authors propose a multimodal modeling paradigm that integrates time-frequency and temporal modalities. Specifically, Morlet wavelet transform is employed to generate spectrograms, from which frequency-aware features are extracted using a Vision Transformer. Exogenous variables are encoded via a Transformer module, and a bidirectional cross-attention mechanism is introduced to unify multimodal representations. This approach effectively captures cross-modal interactions while preserving modality-specific characteristics, significantly enhancing the model’s ability to discern multiscale dynamic patterns in financial time series. Extensive experiments demonstrate that the proposed method consistently outperforms seven state-of-the-art baselines across multiple commodity price prediction tasks, prediction horizons, and evaluation metrics.
πŸ“ Abstract
Forecasting multivariate time series remains challenging due to complex cross-variable dependencies and the presence of heterogeneous external influences. This paper presents Spectrogram-Enhanced Multimodal Fusion (SEMF), which combines spectral and temporal representations for more accurate and robust forecasting. The target time series is transformed into Morlet wavelet spectrograms, from which a Vision Transformer encoder extracts localized, frequency-aware features. In parallel, exogenous variables, such as financial indicators and macroeconomic signals, are encoded via a Transformer to capture temporal dependencies and multivariate dynamics. A bidirectional cross-attention module integrates these modalities into a unified representation that preserves distinct signal characteristics while modeling cross-modal correlations. Applied to multiple commodity price forecasting tasks, SEMF achieves consistent improvements over seven competitive baselines across multiple forecasting horizons and evaluation metrics. These results demonstrate the effectiveness of multimodal fusion and spectrogram-based encoding in capturing multi-scale patterns within complex financial time series.
Problem

Research questions and friction points this paper is trying to address.

multimodal forecasting
commodity prices
time series
cross-variable dependencies
heterogeneous external influences
Innovation

Methods, ideas, or system contributions that make the work stand out.

spectrogram-based representation
multimodal fusion
Vision Transformer
cross-attention mechanism
commodity price forecasting
πŸ”Ž Similar Papers
No similar papers found.
Soyeon Park
Soyeon Park
Ph.D. candidate, Georgia Tech
Systems SecuritySoftware Security
D
Doohee Chung
Handong Global University, Pohang, Republic of Korea; Impactive AI, Seoul, Republic of Korea
C
Charmgil Hong
Handong Global University, Pohang, Republic of Korea; Impactive AI, Seoul, Republic of Korea