Dumviri: Detecting Trackers and Mixed Trackers with a Breakage Detector

📅 2024-02-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tracker detection methods often misblock hybrid trackers—domains serving both tracking and essential functionality—causing webpage breakage. Method: This paper proposes a disconnection-aware tracking identification framework. It introduces the first fully automated detection method for hybrid trackers; models the impact of request blocking via differential feature engineering; and integrates a machine learning–based disconnection detector into the identification pipeline to dynamically avoid functional disruptions. Contributions/Results: The framework achieves 97.44% accuracy in reproducing EasyPrivacy’s blocking decisions on non-hybrid trackers. It discovers 22 novel pure trackers and 26 novel hybrid trackers. Furthermore, it identifies multiple over-aggressive filtering rules responsible for disconnections, thereby establishing a practical technical pathway for jointly optimizing privacy protection and web usability.

Technology Category

Application Category

📝 Abstract
Web tracking harms user privacy. As a result, the use of tracker detection and blocking tools is a common practice among Internet users. However, no such tool can be perfect, and thus there is a trade-off between avoiding breakage (caused by unintentionally blocking some required functionality) and neglecting to block some trackers. State-of-the-art tools usually rely on user reports and developer effort to detect breakages, which can be broadly categorized into two causes: 1) misidentifying non-trackers as trackers, and 2) blocking mixed trackers which blend tracking with functional components. We propose incorporating a machine learning-based breakage detector into the tracker detection pipeline to automatically avoid misidentification of functional resources. For both tracker detection and breakage detection, we propose using differential features that can more clearly elucidate the differences caused by blocking a request. We designed and implemented a prototype of our proposed approach, Duumviri, for non-mixed trackers. We then adopt it to automatically identify mixed trackers, drawing differential features at partial-request granularity. In the case of non-mixed trackers, evaluating Duumviri on 15K pages shows its ability to replicate the labels of human-generated filter lists, EasyPrivacy, with an accuracy of 97.44%. Through a manual analysis, we find that Duumviri can identify previously unreported trackers and its breakage detector can identify overly strict EasyPrivacy rules that cause breakage. In the case of mixed trackers, Duumviri is the first automated mixed tracker detector, and achieves a lower bound accuracy of 74.19%. Duumviri has enabled us to detect and confirm 22 previously unreported unique trackers and 26 unique mixed trackers.
Problem

Research questions and friction points this paper is trying to address.

Detecting and blocking web trackers to protect user privacy.
Avoiding breakage caused by misidentifying non-trackers as trackers.
Automating detection of mixed trackers blending tracking with functional components.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning-based breakage detector integration
Differential features for tracker and breakage detection
Automated mixed tracker detection with partial-request granularity
🔎 Similar Papers
No similar papers found.