Traj-MLLM: Can Multimodal Large Language Models Reform Trajectory Data Mining?

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalizability of existing trajectory analysis models—often constrained to specific geographic regions or single tasks—this paper proposes the first universal trajectory data mining framework based on multimodal large language models (MLLMs). Unlike conventional approaches, our method requires no model weight fine-tuning nor task-specific training data. Instead, raw trajectories are encoded into interleaved image-text sequences that jointly capture spatiotemporal features and multi-perspective contextual information, enabling task-agnostic adaptation via prompt engineering. This work pioneers the integration of MLLMs into trajectory analysis, unifying four distinct tasks—travel time estimation, mobility prediction, anomaly detection, and transportation mode identification—under a single architecture. Extensive experiments on four public benchmarks demonstrate substantial improvements over state-of-the-art methods, with performance gains of 48.05%, 15.52%, 51.52%, and 1.83%, respectively.

Technology Category

Application Category

📝 Abstract
Building a general model capable of analyzing human trajectories across different geographic regions and different tasks becomes an emergent yet important problem for various applications. However, existing works suffer from the generalization problem, ie, they are either restricted to train for specific regions or only suitable for a few tasks. Given the recent advances of multimodal large language models (MLLMs), we raise the question: can MLLMs reform current trajectory data mining and solve the problem? Nevertheless, due to the modality gap of trajectory, how to generate task-independent multimodal trajectory representations and how to adapt flexibly to different tasks remain the foundational challenges. In this paper, we propose exttt{Traj-MLLM}}, which is the first general framework using MLLMs for trajectory data mining. By integrating multiview contexts, exttt{Traj-MLLM}} transforms raw trajectories into interleaved image-text sequences while preserving key spatial-temporal characteristics, and directly utilizes the reasoning ability of MLLMs for trajectory analysis. Additionally, a prompt optimization method is proposed to finalize data-invariant prompts for task adaptation. Extensive experiments on four publicly available datasets show that exttt{Traj-MLLM}} outperforms state-of-the-art baselines by $48.05%$, $15.52%$, $51.52%$, $1.83%$ on travel time estimation, mobility prediction, anomaly detection and transportation mode identification, respectively. exttt{Traj-MLLM}} achieves these superior performances without requiring any training data or fine-tuning the MLLM backbones.
Problem

Research questions and friction points this paper is trying to address.

Building general models for analyzing human trajectories across regions
Overcoming generalization limitations in existing trajectory mining methods
Addressing modality gaps and task adaptation in trajectory analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms trajectories into multimodal image-text sequences
Uses prompt optimization for task adaptation without retraining
Leverages MLLM reasoning for trajectory analysis across tasks
🔎 Similar Papers
No similar papers found.
S
Shuo Liu
University of Chinese Academy of Sciences
Di Yao
Di Yao
Institute of Computing Technology, Chinese Academy of Sciences
Spatial-Temporal Data MiningTrajectory Data MiningGraph Neural NetworkTime-series Analysis
Y
Yan Lin
Department of Computer Science, Aalborg University
Gao Cong
Gao Cong
Nanyang Technological University
Data ManagementDatabasesData MiningSpatial Databases
J
Jingping Bi
Institute of Computing Technology, Chinese Academy of Sciences