Multi-Modal Time Series Prediction via Mixture of Modulated Experts

๐Ÿ“… 2026-01-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of ineffective cross-modal alignment in multimodal time series forecasting, where existing approaches relying on token-level fusion struggle under conditions of scarce textโ€“time series pairs or substantial temporal feature discrepancies. To overcome this limitation, the paper proposes a novel text-guided expert modulation mechanism that, for the first time, leverages textual signals to simultaneously govern both routing decisions and expert computations within a Mixture-of-Experts (MoE) architecture, thereby enabling direct cross-modal control over expert behavior. This approach departs from conventional fusion paradigms and significantly enhances the efficiency and robustness of cross-modal alignment. Extensive experiments across multiple multimodal time series forecasting benchmarks demonstrate consistent and substantial performance improvements, validating the methodโ€™s effectiveness and generalization capability.

Technology Category

Application Category

๐Ÿ“ Abstract
Real-world time series exhibit complex and evolving dynamics, making accurate forecasting extremely challenging. Recent multi-modal forecasting methods leverage textual information such as news reports to improve prediction, but most rely on token-level fusion that mixes temporal patches with language tokens in a shared embedding space. However, such fusion can be ill-suited when high-quality time-text pairs are scarce and when time series exhibit substantial variation in scale and characteristics, thus complicating cross-modal alignment. In parallel, Mixture-of-Experts (MoE) architectures have proven effective for both time series modeling and multi-modal learning, yet many existing MoE-based modality integration methods still depend on token-level fusion. To address this, we propose Expert Modulation, a new paradigm for multi-modal time series prediction that conditions both routing and expert computation on textual signals, enabling direct and efficient cross-modal control over expert behavior. Through comprehensive theoretical analysis and experiments, our proposed method demonstrates substantial improvements in multi-modal time series prediction. The current code is available at https://github.com/BruceZhangReve/MoME
Problem

Research questions and friction points this paper is trying to address.

multi-modal time series prediction
cross-modal alignment
token-level fusion
time-text pairs
Mixture-of-Experts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert Modulation
Mixture of Experts
Multi-Modal Time Series Prediction
Cross-Modal Control
Token-Level Fusion
๐Ÿ”Ž Similar Papers
No similar papers found.