🤖 AI Summary
This work addresses the challenges posed by raw GPS trajectories—characterized by continuity, high noise, and irregular sampling—which hinder traditional spatial tokenization methods from simultaneously achieving fine-grained representation and discriminative pattern modeling. The authors propose TrajTok, a novel framework that learns multi-resolution hexagonal grids from trajectory point distributions to enable adaptive spatial tokenization. Coupled with a factorized Transformer encoder and masked token pretraining, TrajTok produces transferable trajectory representations. Innovatively integrating data-driven spatial partitioning, spatiotemporal rotary position encoding (ST-RoPE), and cross-modal attention mechanisms, the method achieves state-of-the-art performance across diverse downstream tasks—including trajectory similarity search, classification, arrival time estimation, and full-trip duration regression—on the Porto dataset, using only a frozen encoder with lightweight adapters.
📝 Abstract
Learning generalizable trajectory representations from raw GPS traces remains difficult because the data is continuous, noisy, and irregularly sampled. Spatial tokenization is also challenging: fine grids yield sparse cells with weak embeddings, while coarse grids merge heterogeneous movement patterns into the same token. We present TrajTok, a trajectory encoder with a simple pretraining recipe for transferable trajectory embeddings. TrajTok first learns a multi-resolution hexagonal cell partition from the spatial distribution of GPS points, converting noisy GPS sequences into discrete cell tokens. To capture both geometry and kinematics, it uses a factorized transformer encoder with early per-modality self-attention blocks, cross-attention fusion layers, and spatiotemporal rotary position embeddings, ST-RoPE, to encode where and when each token occurs. TrajTok is pretrained with masked-token modeling that recovers both geometric structure and kinematic patterns from partial trajectory observations. On the Porto dataset, a frozen TrajTok encoder with lightweight task adapters achieves strong performance across trajectory similarity search, classification, estimated time of arrival, and full travel-time regression, outperforming multiple task-specific methods. The same frozen encoder supports both geometry-dominated and kinematics-dominated tasks, suggesting that TrajTok learns transferable trajectory structure rather than task-specific shortcuts. These results indicate that learned multi-resolution spatial tokenization combined with masked-token pretraining is a promising direction for general-purpose trajectory foundation models.