UniCast: A Unified Multimodal Prompting Framework for Time Series Forecasting

📅 2025-08-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing time-series foundation models (TSFMs) predominantly adopt unimodal architectures, limiting their ability to leverage ubiquitous multimodal contextual signals—such as visual and textual data—in real-world forecasting scenarios. To address this, we propose the first unified tri-modal (time series + image + text) prompt learning framework for time-series forecasting. Our approach freezes pre-trained TSFMs alongside off-the-shelf vision and language encoders, and introduces modality-specific embeddings coupled with parameter-efficient soft prompt tuning to enable cross-modal collaborative modeling and joint inference. By preserving the generalization capability of frozen foundation models while substantially enhancing inter-modal interaction, our method achieves state-of-the-art performance across multiple mainstream time-series forecasting benchmarks. Crucially, it provides the first systematic empirical validation that multimodal contextual information significantly improves forecasting accuracy.

Technology Category

Application Category

📝 Abstract
Time series forecasting is a foundational task across domains, such as finance, healthcare, and environmental monitoring. While recent advances in Time Series Foundation Models (TSFMs) have demonstrated strong generalisation through large-scale pretraining, existing models operate predominantly in a unimodal setting, ignoring the rich multimodal context, such as visual and textual signals, that often accompanies time series data in real-world scenarios. This paper introduces a novel parameter-efficient multimodal framework, UniCast, that extends TSFMs to jointly leverage time series, vision, and text modalities for enhanced forecasting performance. Our method integrates modality-specific embeddings from pretrained Vision and Text Encoders with a frozen TSFM via soft prompt tuning, enabling efficient adaptation with minimal parameter updates. This design not only preserves the generalisation strength of the foundation model but also enables effective cross-modal interaction. Extensive experiments across diverse time-series forecasting benchmarks demonstrate that UniCast consistently and significantly outperforms all existing TSFM baselines. The findings highlight the critical role of multimodal context in advancing the next generation of general-purpose time series forecasters.
Problem

Research questions and friction points this paper is trying to address.

Extends time series forecasting to leverage multimodal data
Integrates visual and textual signals with time series
Enhances forecasting performance with minimal parameter updates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal framework integrating time series, vision, text
Parameter-efficient soft prompt tuning for cross-modal interaction
Leverages pretrained encoders with frozen foundation model
🔎 Similar Papers
No similar papers found.