Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key limitations of current large language models in time-series reasoning—namely, the scarcity of high-quality chain-of-thought (CoT) training data, inefficient data scheduling strategies, and reinforcement learning algorithms ill-suited to temporal reasoning tasks. To overcome these challenges, the authors propose VeriTime, a novel framework that first constructs the first verifiable, multimodal time-series–text CoT dataset. It then introduces a task-difficulty-aware hierarchical data scheduling mechanism and a two-stage, fine-grained multi-objective reinforcement fine-tuning approach that leverages process-level CoT signals to guide optimization. Experimental results demonstrate that VeriTime substantially enhances the performance of small-scale models (3B/4B parameters) across diverse time-series reasoning benchmarks, achieving parity with or even surpassing that of large proprietary models.

Technology Category

Application Category

📝 Abstract
Time series is a pervasive data type across various application domains, rendering the reasonable solving of diverse time series tasks a long-standing goal. Recent advances in large language models (LLMs), especially their reasoning abilities unlocked through reinforcement learning (RL), have opened new opportunities for tackling tasks with long Chain-of-Thought (CoT) reasoning. However, leveraging LLM reasoning for time series remains in its infancy, hindered by the absence of carefully curated time series CoT data for training, limited data efficiency caused by underexplored data scheduling, and the lack of RL algorithms tailored for exploiting such time series CoT data. In this paper, we introduce VeriTime, a framework that tailors LLMs for time series reasoning through data synthesis, data scheduling, and RL training. First, we propose a data synthesis pipeline that constructs a TS-text multimodal dataset with process-verifiable annotations. Second, we design a data scheduling mechanism that arranges training samples according to a principled hierarchy of difficulty and task taxonomy. Third, we develop a two-stage reinforcement finetuning featuring fine-grained, multi-objective rewards that leverage verifiable process-level CoT data. Extensive experiments show that VeriTime substantially boosts LLM performance across diverse time series reasoning tasks. Notably, it enables compact 3B, 4B models to achieve reasoning capabilities on par with or exceeding those of larger proprietary LLMs.
Problem

Research questions and friction points this paper is trying to address.

time series reasoning
Chain-of-Thought
large language models
data synthesis
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Time Series Reasoning
Process-Verifiable CoT
Data Synthesis
Data Scheduling
Reinforcement Fine-tuning
J
Jiahui Zhou
Sun Yat-sen University
D
Dan Li
Sun Yat-sen University
B
Boxin Li
Xiaomi Corporation
X
Xiao Zhang
Xiaomi Corporation
E
Erli Meng
Xiaomi Corporation
L
Lin Li
Sun Yat-sen University
Zhuomin Chen
Zhuomin Chen
PhD student at Florida International University
J
Jian Lou
Sun Yat-sen University
See-Kiong Ng
See-Kiong Ng
School of Computing and Institute of Data Science, National University of Singapore
artificial intelligencenatural language processingdata miningsmart citiesbioinformatics