🤖 AI Summary
Existing time-series analysis methods predominantly rely on unimodal numerical data, limiting their capacity to integrate time-varying multimodal signals—such as text, images, and audio—thereby undermining reasoning depth, interpretability, and robustness. To address this, we pioneer the systematic application of multimodal large language models (MLLMs) to cross-modal temporal reasoning, introducing a temporal-aware cross-modal alignment mechanism, an interpretability-enhanced prompting strategy, and a domain-adaptive inference framework. Our core contributions are threefold: (1) establishing the first theoretical framework and research roadmap for MLLMs in time-series understanding; (2) defining a novel “trust-driven, logically traceable” temporal reasoning paradigm; and (3) empirically validating superior robustness and decision interpretability across financial, healthcare, and industrial time-series benchmarks. This work lays a foundational methodology for multimodal temporal intelligence.
📝 Abstract
Understanding time series data is crucial for multiple real-world applications. While large language models (LLMs) show promise in time series tasks, current approaches often rely on numerical data alone, overlooking the multimodal nature of time-dependent information, such as textual descriptions, visual data, and audio signals. Moreover, these methods underutilize LLMs' reasoning capabilities, limiting the analysis to surface-level interpretations instead of deeper temporal and multimodal reasoning. In this position paper, we argue that multimodal LLMs (MLLMs) can enable more powerful and flexible reasoning for time series analysis, enhancing decision-making and real-world applications. We call on researchers and practitioners to leverage this potential by developing strategies that prioritize trust, interpretability, and robust reasoning in MLLMs. Lastly, we highlight key research directions, including novel reasoning paradigms, architectural innovations, and domain-specific applications, to advance time series reasoning with MLLMs.