π€ AI Summary
Trajectory data suffer from severe noise, missing values, and privacy sensitivity, making it challenging for existing methods to simultaneously ensure privacy preservation and cross-task generalization. This paper proposes the first federated learning-enabled unified Trajectory Data Preparation (TDP) framework, enabling collaborative, cross-institutional data quality enhancement without sharing raw trajectories. Key contributions include: (1) the Trajectory Privacy Autoencoder and an LLM-driven Trajectory Knowledge Enhancerβthe first adaptation of large language models to federated TDP; and (2) a federated parallel optimization mechanism that jointly guarantees privacy, generalization, and training efficiency. Extensive experiments across six real-world datasets and ten TDP tasks demonstrate state-of-the-art performance over 13 baselines: under privacy constraints, reconstruction error decreases by 27.4%, and average task F1-score improves by 19.8%.
π Abstract
Trajectory data, which capture the movement patterns of people and vehicles over time and space, are crucial for applications like traffic optimization and urban planning. However, issues such as noise and incompleteness often compromise data quality, leading to inaccurate trajectory analyses and limiting the potential of these applications. While Trajectory Data Preparation (TDP) can enhance data quality, existing methods suffer from two key limitations: (i) they do not address data privacy concerns, particularly in federated settings where trajectory data sharing is prohibited, and (ii) they typically design task-specific models that lack generalizability across diverse TDP scenarios. To overcome these challenges, we propose FedTDP, a privacy-preserving and unified framework that leverages the capabilities of Large Language Models (LLMs) for TDP in federated environments. Specifically, we: (i) design a trajectory privacy autoencoder to secure data transmission and protect privacy, (ii) introduce a trajectory knowledge enhancer to improve model learning of TDP-related knowledge, enabling the development of TDP-oriented LLMs, and (iii) propose federated parallel optimization to enhance training efficiency by reducing data transmission and enabling parallel model training. Experiments on 6 real datasets and 10 mainstream TDP tasks demonstrate that FedTDP consistently outperforms 13 state-of-the-art baselines.