How do we measure privacy in text? A survey of text anonymization metrics

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Text anonymization lacks reliable, regulation-compliant, and user-aligned privacy assessment methods. This work bridges this gap through a systematic literature review and interdisciplinary manual analysis—integrating NLP, legal text analysis, and HCI empirical findings—to jointly examine GDPR, HIPAA, and other regulatory frameworks alongside user mental models. We identify and critically analyze six privacy conceptualizations in terms of their effectiveness for risk characterization. Our analysis reveals substantial misalignments among existing metrics, legal requirements, and user expectations. We propose the first privacy metric framework explicitly designed for both regulatory compliance and human-centered design, delineating method-specific applicability boundaries and concrete improvement pathways. The framework delivers actionable guidance for practitioners and advances text privacy evaluation toward greater rigor, cross-study comparability, and interpretability. (149 words)

Technology Category

Application Category

📝 Abstract

In this work, we aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey. Although text anonymization is essential for enabling NLP research and model development in domains with sensitive data, evaluating whether anonymization methods sufficiently protect privacy remains an open challenge. In manually reviewing 47 papers that report privacy metrics, we identify and compare six distinct privacy notions, and analyze how the associated metrics capture different aspects of privacy risk. We then assess how well these notions align with legal privacy standards (HIPAA and GDPR), as well as user-centered expectations grounded in HCI studies. Our analysis offers practical guidance on navigating the landscape of privacy evaluation approaches further and highlights gaps in current practices. Ultimately, we aim to facilitate more robust, comparable, and legally aware privacy evaluations in text anonymization.

Problem

Research questions and friction points this paper is trying to address.

Surveying metrics for evaluating privacy protection in text anonymization

Assessing alignment of privacy notions with legal standards and user expectations

Providing guidance to improve privacy evaluation approaches in text anonymization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically survey and compare text anonymization privacy metrics

Align privacy notions with legal standards and user expectations

Provide guidance for robust and comparable privacy evaluations

🔎 Similar Papers

Large Language Models are Advanced Anonymizers