🤖 AI Summary
This work addresses the limitations of both general-purpose large language models, which lack domain-specific temporal knowledge, and specialized time-series models, which exhibit constrained reasoning generalization for complex diagnostic tasks. To bridge this gap, the authors propose a hybrid knowledge injection framework that leverages reinforcement learning with a verifiable reward mechanism (RLVR) to automatically generate reasoning trajectories enriched with temporal insights. These trajectories are then injected into a general large language model under unsupervised conditions, endowing it with strong, domain-informed reasoning capabilities. The study introduces SenTSR-Bench, the first multivariate time-series diagnostic benchmark grounded in real-world industrial scenarios. Experimental results demonstrate that the proposed method significantly outperforms specialized time-series models (by 9.1%–26.1%) and general large language models (by 7.9%–22.4%) across SenTSR-Bench and multiple public datasets, substantially enhancing diagnostic accuracy and contextual awareness.
📝 Abstract
Time-series diagnostic reasoning is essential for many applications, yet existing solutions face a persistent gap: general reasoning large language models (GRLMs) possess strong reasoning skills but lack the domain-specific knowledge to understand complex time-series patterns. Conversely, fine-tuned time-series LLMs (TSLMs) understand these patterns but lack the capacity to generalize reasoning for more complicated questions. To bridge this gap, we propose a hybrid knowledge-injection framework that injects TSLM-generated insights directly into GRLM's reasoning trace, thereby achieving strong time-series reasoning with in-domain knowledge. As collecting data for knowledge injection fine-tuning is costly, we further leverage a reinforcement learning-based approach with verifiable rewards (RLVR) to elicit knowledge-rich traces without human supervision, then transfer such an in-domain thinking trace into GRLM for efficient knowledge injection. We further release SenTSR-Bench, a multivariate time-series-based diagnostic reasoning benchmark collected from real-world industrial operations. Across SenTSR-Bench and other public datasets, our method consistently surpasses TSLMs by 9.1%-26.1% and GRLMs by 7.9%-22.4%, delivering robust, context-aware time-series diagnostic insights.