TrajTok: Adaptive Spatial Tokenization for Trajectory Representation Learning

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the challenges posed by raw GPS trajectories—characterized by continuity, high noise, and irregular sampling—which hinder traditional spatial tokenization methods from simultaneously achieving fine-grained representation and discriminative pattern modeling. The authors propose TrajTok, a novel framework that learns multi-resolution hexagonal grids from trajectory point distributions to enable adaptive spatial tokenization. Coupled with a factorized Transformer encoder and masked token pretraining, TrajTok produces transferable trajectory representations. Innovatively integrating data-driven spatial partitioning, spatiotemporal rotary position encoding (ST-RoPE), and cross-modal attention mechanisms, the method achieves state-of-the-art performance across diverse downstream tasks—including trajectory similarity search, classification, arrival time estimation, and full-trip duration regression—on the Porto dataset, using only a frozen encoder with lightweight adapters.

📝 Abstract

Learning generalizable trajectory representations from raw GPS traces remains difficult because the data is continuous, noisy, and irregularly sampled. Spatial tokenization is also challenging: fine grids yield sparse cells with weak embeddings, while coarse grids merge heterogeneous movement patterns into the same token. We present TrajTok, a trajectory encoder with a simple pretraining recipe for transferable trajectory embeddings. TrajTok first learns a multi-resolution hexagonal cell partition from the spatial distribution of GPS points, converting noisy GPS sequences into discrete cell tokens. To capture both geometry and kinematics, it uses a factorized transformer encoder with early per-modality self-attention blocks, cross-attention fusion layers, and spatiotemporal rotary position embeddings, ST-RoPE, to encode where and when each token occurs. TrajTok is pretrained with masked-token modeling that recovers both geometric structure and kinematic patterns from partial trajectory observations. On the Porto dataset, a frozen TrajTok encoder with lightweight task adapters achieves strong performance across trajectory similarity search, classification, estimated time of arrival, and full travel-time regression, outperforming multiple task-specific methods. The same frozen encoder supports both geometry-dominated and kinematics-dominated tasks, suggesting that TrajTok learns transferable trajectory structure rather than task-specific shortcuts. These results indicate that learned multi-resolution spatial tokenization combined with masked-token pretraining is a promising direction for general-purpose trajectory foundation models.

Problem

Research questions and friction points this paper is trying to address.

trajectory representation learning

spatial tokenization

GPS trajectories

multi-resolution

generalizable embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive spatial tokenization

multi-resolution hexagonal partitioning

factorized transformer