🤖 AI Summary
This paper addresses the lack of natural language interfaces and benchmark datasets in trajectory visualization systems by proposing Text-to-TrajVis—a novel task that enables end-to-end generation of executable trajectory data visualization code from natural language queries. Methodologically, we introduce Trajectory Visualization Language (TVL), the first domain-specific language for trajectory visualization; construct TrajVL, the first large-scale benchmark dataset comprising 18,140 question-answer pairs; and establish a human–LLM collaborative paradigm for data synthesis and annotation. Our contributions are threefold: (1) formal definition and empirical validation of the task’s feasibility and inherent challenges; (2) open-sourcing of both the TrajVL dataset and the TVL specification; and (3) development of a systematic, multi-model evaluation framework that reveals current LLMs’ limitations in generating structured visualization code.
📝 Abstract
This paper introduces the Text-to-TrajVis task, which aims to transform natural language questions into trajectory data visualizations, facilitating the development of natural language interfaces for trajectory visualization systems. As this is a novel task, there is currently no relevant dataset available in the community. To address this gap, we first devised a new visualization language called Trajectory Visualization Language (TVL) to facilitate querying trajectory data and generating visualizations. Building on this foundation, we further proposed a dataset construction method that integrates Large Language Models (LLMs) with human efforts to create high-quality data. Specifically, we first generate TVLs using a comprehensive and systematic process, and then label each TVL with corresponding natural language questions using LLMs. This process results in the creation of the first large-scale Text-to-TrajVis dataset, named TrajVL, which contains 18,140 (question, TVL) pairs. Based on this dataset, we systematically evaluated the performance of multiple LLMs (GPT, Qwen, Llama, etc.) on this task. The experimental results demonstrate that this task is both feasible and highly challenging and merits further exploration within the research community.