Measuring the Accuracy and Effectiveness of PII Removal Services

📅 2025-05-11

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Commercial PII deletion services (e.g., DeleteMe, Incogni) claim to remove users’ personal information from data broker databases, yet their efficacy remains unverified by independent empirical evaluation. Method: This work introduces the first large-scale, user-driven evaluation framework—integrating real subscriptions, manual annotation, web scraping for ground-truth comparison, and textual analysis of service claims—to assess coverage, PII identification accuracy, and deletion effectiveness across major services. Results: Only 41.1% of records flagged as PII by services corresponded to users’ actual identities; among verified true PII records, only 48.2% were successfully removed; and all services covered far fewer data brokers than advertised. The study uncovers systemic deficiencies across three dimensions—coverage breadth, classification precision, and operational efficacy—and establishes a reproducible methodology for evaluating privacy-enhancing technologies.

Technology Category

Application Category

📝 Abstract

This paper presents the first large-scale empirical study of commercial personally identifiable information (PII) removal systems -- commercial services that claim to improve privacy by automating the removal of PII from data broker's databases. Popular examples of such services include DeleteMe, Mozilla Monitor, Incogni, among many others. The claims these services make may be very appealing to privacy-conscious Web users, but how effective these services actually are at improving privacy has not been investigated. This work aims to improve our understanding of commercial PII removal services in multiple ways. First, we conduct a user study where participants purchase subscriptions from four popular PII removal services, and report (i) what PII the service find, (ii) from which data brokers, (iii) whether the service is able to have the information removed, and (iv) whether the identified information actually is PII describing the participant. And second, by comparing the claims and promises the services makes (e.g. which and how many data brokers each service claims to cover). We find that these services have significant accuracy and coverage issues that limit the usefulness of these services as a privacy-enhancing technology. For example, we find that the measured services are unable to remove the majority of the identified PII records from data broker's (48.2% of the successfully removed found records) and that most records identified by these services are not PII about the user (study participants found that only 41.1% of records identified by these services were PII about themselves).

Problem

Research questions and friction points this paper is trying to address.

Evaluates accuracy of commercial PII removal services

Assesses effectiveness in deleting user PII from brokers

Compares service claims with actual PII removal performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale empirical study of PII removal services

User study with four popular PII removal services

Comparison of service claims versus actual performance

🔎 Similar Papers

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding