Traj-MLLM: Can Multimodal Large Language Models Reform Trajectory Data Mining?

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address the limited generalizability of existing trajectory analysis models—often constrained to specific geographic regions or single tasks—this paper proposes the first universal trajectory data mining framework based on multimodal large language models (MLLMs). Unlike conventional approaches, our method requires no model weight fine-tuning nor task-specific training data. Instead, raw trajectories are encoded into interleaved image-text sequences that jointly capture spatiotemporal features and multi-perspective contextual information, enabling task-agnostic adaptation via prompt engineering. This work pioneers the integration of MLLMs into trajectory analysis, unifying four distinct tasks—travel time estimation, mobility prediction, anomaly detection, and transportation mode identification—under a single architecture. Extensive experiments on four public benchmarks demonstrate substantial improvements over state-of-the-art methods, with performance gains of 48.05%, 15.52%, 51.52%, and 1.83%, respectively.

Technology Category

Application Category

📝 Abstract

Building a general model capable of analyzing human trajectories across different geographic regions and different tasks becomes an emergent yet important problem for various applications. However, existing works suffer from the generalization problem, ie, they are either restricted to train for specific regions or only suitable for a few tasks. Given the recent advances of multimodal large language models (MLLMs), we raise the question: can MLLMs reform current trajectory data mining and solve the problem? Nevertheless, due to the modality gap of trajectory, how to generate task-independent multimodal trajectory representations and how to adapt flexibly to different tasks remain the foundational challenges. In this paper, we propose exttt{Traj-MLLM}}, which is the first general framework using MLLMs for trajectory data mining. By integrating multiview contexts, exttt{Traj-MLLM}} transforms raw trajectories into interleaved image-text sequences while preserving key spatial-temporal characteristics, and directly utilizes the reasoning ability of MLLMs for trajectory analysis. Additionally, a prompt optimization method is proposed to finalize data-invariant prompts for task adaptation. Extensive experiments on four publicly available datasets show that exttt{Traj-MLLM}} outperforms state-of-the-art baselines by $48.05%$, $15.52%$, $51.52%$, $1.83%$ on travel time estimation, mobility prediction, anomaly detection and transportation mode identification, respectively. exttt{Traj-MLLM}} achieves these superior performances without requiring any training data or fine-tuning the MLLM backbones.

Problem

Research questions and friction points this paper is trying to address.

Building general models for analyzing human trajectories across regions

Overcoming generalization limitations in existing trajectory mining methods

Addressing modality gaps and task adaptation in trajectory analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms trajectories into multimodal image-text sequences

Uses prompt optimization for task adaptation without retraining

Leverages MLLM reasoning for trajectory analysis across tasks

🔎 Similar Papers

Large Language Models for Mobility Analysis in Transportation Systems: A Survey on Forecasting Tasks