🤖 AI Summary
Current multimodal large language models (MLLMs) face two critical bottlenecks in modeling periodic phenomena (e.g., meteorological, traffic, and biosignals): insufficient temporal modeling capability and conflicting representations between short- and long-term periodicities. To address these challenges, we propose the first systematic solution for cross-modal periodic understanding. Our method introduces an “easy-to-hard generalization” training paradigm and a “logic-robust forgetting mitigation” optimization strategy; constructs the first cross-modal periodic benchmark spanning multiple difficulty levels; and integrates temporally aware embeddings, periodicity-aware attention mechanisms, and semantic alignment constraints into the MLLM architecture. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art MLLMs on periodicity detection, forecasting, and attribution tasks. Notably, it achieves, for the first time, robust periodic reasoning and generalization across text, vision, and language modalities in a unified framework.
📝 Abstract
Periodic or quasi-periodic phenomena reveal intrinsic characteristics in various natural processes, such as weather patterns, movement behaviors, traffic flows, and biological signals. Given that these phenomena span multiple modalities, the capabilities of Multimodal Large Language Models (MLLMs) offer promising potential to effectively capture and understand their complex nature. However, current MLLMs struggle with periodic tasks due to limitations in: 1) lack of temporal modelling and 2) conflict between short and long periods. This paper introduces Period-LLM, a multimodal large language model designed to enhance the performance of periodic tasks across various modalities, and constructs a benchmark of various difficulty for evaluating the cross-modal periodic capabilities of large models. Specially, We adopt an"Easy to Hard Generalization"paradigm, starting with relatively simple text-based tasks and progressing to more complex visual and multimodal tasks, ensuring that the model gradually builds robust periodic reasoning capabilities. Additionally, we propose a"Resisting Logical Oblivion"optimization strategy to maintain periodic reasoning abilities during semantic alignment. Extensive experiments demonstrate the superiority of the proposed Period-LLM over existing MLLMs in periodic tasks. The code is available at https://github.com/keke-nice/Period-LLM.