Multi-Modal Time Series Prediction via Mixture of Modulated Experts

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenge of ineffective cross-modal alignment in multimodal time series forecasting, where existing approaches relying on token-level fusion struggle under conditions of scarce text–time series pairs or substantial temporal feature discrepancies. To overcome this limitation, the paper proposes a novel text-guided expert modulation mechanism that, for the first time, leverages textual signals to simultaneously govern both routing decisions and expert computations within a Mixture-of-Experts (MoE) architecture, thereby enabling direct cross-modal control over expert behavior. This approach departs from conventional fusion paradigms and significantly enhances the efficiency and robustness of cross-modal alignment. Extensive experiments across multiple multimodal time series forecasting benchmarks demonstrate consistent and substantial performance improvements, validating the method’s effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract

Real-world time series exhibit complex and evolving dynamics, making accurate forecasting extremely challenging. Recent multi-modal forecasting methods leverage textual information such as news reports to improve prediction, but most rely on token-level fusion that mixes temporal patches with language tokens in a shared embedding space. However, such fusion can be ill-suited when high-quality time-text pairs are scarce and when time series exhibit substantial variation in scale and characteristics, thus complicating cross-modal alignment. In parallel, Mixture-of-Experts (MoE) architectures have proven effective for both time series modeling and multi-modal learning, yet many existing MoE-based modality integration methods still depend on token-level fusion. To address this, we propose Expert Modulation, a new paradigm for multi-modal time series prediction that conditions both routing and expert computation on textual signals, enabling direct and efficient cross-modal control over expert behavior. Through comprehensive theoretical analysis and experiments, our proposed method demonstrates substantial improvements in multi-modal time series prediction. The current code is available at https://github.com/BruceZhangReve/MoME

Problem

Research questions and friction points this paper is trying to address.

multi-modal time series prediction

cross-modal alignment

token-level fusion

time-text pairs

Mixture-of-Experts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert Modulation

Mixture of Experts

Multi-Modal Time Series Prediction