🤖 AI Summary
Tor networks are vulnerable to traffic correlation attacks; existing approaches exhibit poor robustness under noise injection and partial observation, and suffer from limited scalability due to pairwise matching. This paper proposes an efficient machine learning–based deanonymization framework: it integrates attention-enhanced multi-instance learning (MIL) with GRU-based temporal encoding to construct a Siamese network that learns robust traffic representations, and couples this with approximate nearest neighbor (aNN) search for scalable, high-speed matching. The method significantly improves noise resilience—achieving up to a 60% increase in true positive rate under high-noise conditions—while reducing both training and inference time by over 50%. Its inference complexity scales nearly linearly with dataset size, thereby balancing accuracy, efficiency, and scalability.
📝 Abstract
Tor is a widely used anonymity network that conceals user identities by routing traffic through encrypted relays, yet it remains vulnerable to traffic correlation attacks that deanonymize users by matching patterns in ingress and egress traffic. However, existing correlation methods suffer from two major limitations: limited robustness to noise and partial observations, and poor scalability due to computationally expensive pairwise matching. To address these challenges, we propose RECTor, a machine learning-based framework for traffic correlation under realistic conditions. RECTor employs attention-based Multiple Instance Learning (MIL) and GRU-based temporal encoding to extract robust flow representations, even when traffic data is incomplete or obfuscated. These embeddings are mapped into a shared space via a Siamese network and efficiently matched using approximate nearest neighbor (aNN) search. Empirical evaluations show that RECTor outperforms state-of-the-art baselines such as DeepCorr, DeepCOFFEA, and FlowTracker, achieving up to 60% higher true positive rates under high-noise conditions and reducing training and inference time by over 50%. Moreover, RECTor demonstrates strong scalability: inference cost grows near-linearly as the number of flows increases. These findings reveal critical vulnerabilities in Tor's anonymity model and highlight the need for advanced model-aware defenses.