🤖 AI Summary
This work exposes severe user re-identification risks in de-identified GPS cycling trajectory data: even after removing explicit user identifiers, a single GPS trajectory—characterized solely by its spatiotemporal features—can be linked to a real-world user with high accuracy. To address this, the authors propose the first trajectory-to-user linkage attack specifically designed for individual GPS trips (not check-in data), integrating location co-occurrence statistics, trajectory similarity modeling, and empirical risk assessment. Experiments on two real-world cycling datasets demonstrate high re-identification success rates (up to 92.3%), rendering common privacy-preserving measures—such as trajectory truncation—completely ineffective. Notably, users visiting low-co-occurrence locations are more vulnerable due to behavioral uniqueness. This study not only reveals critical flaws in current trajectory anonymization practices but also establishes a novel privacy threat model at the single-trip granularity, providing foundational theoretical insights and an essential evaluation benchmark for trajectory data governance.
📝 Abstract
Open human mobility data is considered an essential basis for the profound research and analysis required for the transition to sustainable mobility and sustainable urban planning. Cycling data has especially been the focus of data collection endeavors in recent years. Although privacy risks regarding location data are widely known, practitioners often refrain from advanced privacy mechanisms to prevent utility losses. Removing user identifiers from trips is thereby deemed a major privacy gain, as it supposedly prevents linking single trips to obtain entire movement patterns. In this paper, we propose a novel attack to reconstruct user identifiers in GPS trip datasets consisting of single trips, unlike previous ones that are dedicated to evaluating trajectory-user linking in the context of check-in data. We evaluate the remaining privacy risk for users in such datasets and our empirical findings from two real-world datasets show that the risk of re-identification is significant even when personal identifiers have been removed, and that truncation as a simple additional privacy mechanism may not be effective in protecting user privacy. Further investigations indicate that users who frequently visit locations that are only visited by a small number of others, tend to be more vulnerable to re-identification.