Reality Check for Tor Website Fingerprinting in the Open World

πŸ“… 2026-03-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the gap between idealized experimental settings and real-world conditions in Tor website fingerprinting attacks. Proposing the first open-world evaluation framework from the guard node perspective, it leverages privacy-preserving collection of real unlabeled Tor traffic combined with synthetic monitored traces to construct a large-scale dataset comprising over 800,000 traffic instances. Through cross-network benchmarking, the study systematically evaluates state-of-the-art attack methods, revealing the robustness advantage of timing-agnostic classifiers under dynamic network conditions and providing the first quantitative assessment of Conflux’s traffic splitting impact on attack efficacy. Experiments demonstrate that the best-performing attack achieves 0.956 precision and 0.922 recall at a 9% base rate, exhibiting strong robustness to limited training samples, network jitter, and concept drift. Notably, even with Conflux enabled, guard nodes retaining latency advantages sustain high identification performance.

Technology Category

Application Category

πŸ“ Abstract
Website fingerprinting (WF) attacks on Tor can infer user destinations from encrypted traffic metadata. However, their real-world effectiveness remains debated due to laboratory settings that fail to capture network fluctuations, evaluate noise, and create a representative open world. In this work, we re-examine WF from a guard-relay vantage point using a novel, privacy-preserving methodology that builds an open-world background from real, unlabeled Tor traffic paired with synthetic monitored traces. Using this methodology, we collect a large-scale dataset of over 800,000 traces. We then benchmark state-of-the-art WF attacks under a cross-network setting and show that WF remains highly effective against real Tor open-world traffic: the best-performing attack achieves 0.956 precision and 0.922 recall at a 9% base rate. We further present results that demonstrate robustness to small training sets, network jitter, and concept drift. Moreover, we show that timing-independent classifiers are significantly more robust to network variability than others. Finally, we provide the first systematic study of Tor's Conflux traffic-splitting, where we show that a guard node with a latency advantage can maintain high attack effectiveness even when traffic is split.
Problem

Research questions and friction points this paper is trying to address.

website fingerprinting
Tor
open-world
traffic analysis
privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

website fingerprinting
open-world evaluation
Tor network
traffic splitting
timing-independent classifiers
πŸ”Ž Similar Papers
No similar papers found.
M
Mohammadhamed Shadbeh
Simon Fraser University
K
Khashayar Khajavi
Simon Fraser University
Tao Wang
Tao Wang
Assistant Professor, Simon Fraser University
PrivacyCybersecurityMachine Learning