TrajPrism: A Multi-Task Benchmark for Language-Grounded Urban Trajectory Understanding

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the lack of a unified evaluation framework for fine-grained, verifiable alignment between urban trajectories and natural language descriptions, noting that existing trajectory modeling approaches predominantly emphasize geometric features while neglecting language grounding. To bridge this gap, the authors propose TrajPrism, a multitask benchmark that integrates instruction-conditioned trajectory generation, semantic trajectory retrieval, and trajectory description into a cohesive evaluation protocol assessing trajectory fidelity, retrieval quality, and language alignment. They introduce a novel four-dimensional travel intent taxonomy to guide linguistic annotation and construct a large-scale dataset comprising 300,000 real-world trajectories from Porto, San Francisco, and Beijing, along with 2.1 million human-annotated task instances. Three models—TrajAnchor, TrajFuse, and TrajRap—are developed under this framework. Experiments demonstrate that geometry-only baselines perform substantially worse in language-involved tasks, underscoring the necessity and effectiveness of the proposed approach.

📝 Abstract

Urban mobility is naturally expressed both as trajectories in space and as natural-language descriptions of travel intent, constraints, and preferences. However, prior work rarely evaluates these two modalities together on the same real-world trajectories: trajectory modeling often stays geometry-centric, while language-centric mobility benchmarks frequently target route planning and tool use rather than fine-grained, verifiable alignment between text and the underlying route. We introduce TrajPrism, a multi-task benchmark for language-trajectory alignment that unifies (i) instruction-conditioned trajectory generation, (ii) language-driven semantic trajectory retrieval, and (iii) trajectory captioning, together with an evaluation protocol that measures trajectory fidelity, retrieval quality, and language groundedness. We construct TrajPrism by pairing real urban trajectories with judge-filtered language annotations generated under a four-dimensional travel-intent taxonomy. The benchmark contains 300K selected trajectories across Porto, San Francisco, and Beijing, yielding 2.1M task instances from three instruction variants, three retrieval queries, and one caption per trajectory. We further develop proof-of-concept models for each task: TrajAnchor for instruction-conditioned trajectory generation, TrajFuse for semantic trajectory retrieval, and TrajRap for trajectory captioning. These models instantiate the proposed tasks and show that geometry-only trajectory baselines leave a large gap on our protocol, especially where language is part of the input-output interface. We release TrajPrism with code and a reproducible annotation pipeline that is designed to be portable across cities, given compatible trajectory inputs and map resources.

Problem

Research questions and friction points this paper is trying to address.

trajectory understanding

language grounding

urban mobility

multimodal alignment

natural language

Innovation

Methods, ideas, or system contributions that make the work stand out.

language-trajectory alignment

multi-task benchmark

urban mobility