π€ AI Summary
This work addresses the gap between idealized experimental settings and real-world conditions in Tor website fingerprinting attacks. Proposing the first open-world evaluation framework from the guard node perspective, it leverages privacy-preserving collection of real unlabeled Tor traffic combined with synthetic monitored traces to construct a large-scale dataset comprising over 800,000 traffic instances. Through cross-network benchmarking, the study systematically evaluates state-of-the-art attack methods, revealing the robustness advantage of timing-agnostic classifiers under dynamic network conditions and providing the first quantitative assessment of Confluxβs traffic splitting impact on attack efficacy. Experiments demonstrate that the best-performing attack achieves 0.956 precision and 0.922 recall at a 9% base rate, exhibiting strong robustness to limited training samples, network jitter, and concept drift. Notably, even with Conflux enabled, guard nodes retaining latency advantages sustain high identification performance.
π Abstract
Website fingerprinting (WF) attacks on Tor can infer user destinations from encrypted traffic metadata. However, their real-world effectiveness remains debated due to laboratory settings that fail to capture network fluctuations, evaluate noise, and create a representative open world. In this work, we re-examine WF from a guard-relay vantage point using a novel, privacy-preserving methodology that builds an open-world background from real, unlabeled Tor traffic paired with synthetic monitored traces. Using this methodology, we collect a large-scale dataset of over 800,000 traces. We then benchmark state-of-the-art WF attacks under a cross-network setting and show that WF remains highly effective against real Tor open-world traffic: the best-performing attack achieves 0.956 precision and 0.922 recall at a 9% base rate. We further present results that demonstrate robustness to small training sets, network jitter, and concept drift. Moreover, we show that timing-independent classifiers are significantly more robust to network variability than others. Finally, we provide the first systematic study of Tor's Conflux traffic-splitting, where we show that a guard node with a latency advantage can maintain high attack effectiveness even when traffic is split.