🤖 AI Summary
This study addresses the challenge posed by server-side Google Analytics (sGA), which evades existing anti-tracking mechanisms due to browser-enforced restrictions on client-side tracking. To counter this, the authors propose SST-Guard, a novel system that introduces a semantic-value-template-based multimodal detection approach. By analyzing semantic cues—such as identifiers and event metadata—across any endpoint involved in data collection or sharing, SST-Guard accurately detects and blocks sGA without relying on predefined tracking endpoints. This design effectively mitigates evasion techniques like endpoint customization and payload obfuscation. Empirical evaluation on the Tranco top-10k websites identifies 403 sGA domains with over 93% precision, and further analysis of the top-150k sites reveals 6,314 websites employing sGA.
📝 Abstract
As web browsers increasingly restrict client-side tracking, the web tracking ecosystem is shifting from client-side to server-side tracking (SST). In SST, the browser sends tracking requests to an intermediate endpoint, which then forwards them to the tracker's endpoint, eliminating direct client-to-tracker requests. As a result, existing tracking protections that block requests to known tracker endpoints are rendered ineffective.
In this paper, we investigate server-side implementation of Google Analytics, the most widely deployed third-party tracking service on the web today. We also present SST-Guard, a multi-modal, browser-based system for detecting and blocking server-side Google Analytics (sGA). Our key insight is that even when the tracker's endpoints change, sGA must necessarily still collect and share the same semantic information as client-side Google Analytics (e.g., identifiers, event metadata). Therefore, rather than detecting requests to known Google Analytics endpoints, SST-Guard aims to detect underlying artifacts of collection and sharing of these semantic values to any arbitrary endpoint. Operationalizing this insight is challenging because real-world sGA deployments commonly customize endpoints and obfuscate URLs/payloads. SST-Guard addresses this challenge using a value-template approach that employs regular expressions to match semantic value patterns across multiple modalities: network requests, cookies, and the window object.
We validate SST-Guard on Tranco top-10k websites, detecting 4.02\% (403) sGA domains with over 93\% accuracy across three modalities, with network request classifier demonstrating the highest accuracy (99.8\%). By deploying SST-Guard in the wild, we find 4.21\% (6,314) of Tranco top-150k websites using sGA.