๐ค AI Summary
This work addresses the challenges of ambiguous task formulation and the lack of standardized benchmarks in time series understanding for large language models. To this end, the authors propose a four-level cognitive complexity taxonomy for time series reasoning, construct HiTSRโa dataset comprising 83k samplesโand introduce a novel multi-stage curriculum fine-tuning approach that integrates visual patterns with numerical tables, augmented with verification-oriented chain-of-thought annotations. The resulting model, LLaTiSA, demonstrates strong performance across diverse time series reasoning tasks, significantly enhancing the temporal awareness and cross-task generalization capabilities of vision-language models. Notably, LLaTiSA exhibits robust out-of-distribution generalization, underscoring its effectiveness in real-world scenarios where data distributions may shift.
๐ Abstract
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.