🤖 AI Summary
Existing time series retrieval methods struggle to accurately localize fine-grained segments based on natural language queries, as they rely on expert-designed global similarity metrics. This work proposes LaSTR, the first framework for language-driven, fine-grained time series segment retrieval. LaSTR leverages the TV2 segmentation algorithm to extract local segments and employs GPT to generate high-quality textual descriptions, thereby constructing a large-scale paired dataset. Built upon a Conformer architecture and trained with contrastive learning, LaSTR aligns time series segments and natural language in a shared embedding space to achieve cross-modal semantic alignment. Experiments demonstrate that LaSTR significantly outperforms random and CLIP-based baselines across various candidate pool sizes, substantially improving retrieval ranking quality and semantic consistency.
📝 Abstract
Effectively searching time-series data is essential for system analysis, but existing methods often require expert-designed similarity criteria or rely on global, series-level descriptions. We study language-driven segment retrieval: given a natural language query, the goal is to retrieve relevant local segments from large time-series repositories. We build large-scale segment--caption training data by applying TV2-based segmentation to LOTSA windows and generating segment descriptions with GPT-5.2, and then train a Conformer-based contrastive retriever in a shared text--time-series embedding space. On a held-out test split, we evaluate single-positive retrieval together with caption-side consistency (SBERT and VLM-as-a-judge) under multiple candidate pool sizes. Across all settings, LaSTR outperforms random and CLIP baselines, yielding improved ranking quality and stronger semantic agreement between retrieved segments and query intent.