Tau-Eval: A Unified Evaluation Framework for Useful and Private Text Anonymization

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text anonymization has long struggled to balance privacy preservation with semantic and task utility, further hindered by the absence of a unified, multidimensional, and reproducible evaluation framework. To address this, we propose the first open-source, unified evaluation framework that establishes a privacy–utility dual-sensitivity paradigm and introduces computable, task-specific sensitivity metrics—overcoming the limitations of conventional unidimensional evaluation. Our framework integrates differential privacy verification, semantic fidelity assessment (via BERTScore and natural language inference), downstream task performance degradation (on NER, QA, and summarization), and adversarial robustness testing. We conduct systematic evaluations across 12 state-of-the-art anonymization methods and 5 real-world datasets, achieving significantly improved evaluation consistency (Cohen’s κ = 0.89) and task relevance (Pearson’s r = 0.93). The implementation, benchmarking suite, and comprehensive tutorials are publicly released.

Technology Category

Application Category

📝 Abstract
Text anonymization is the process of removing or obfuscating information from textual data to protect the privacy of individuals. This process inherently involves a complex trade-off between privacy protection and information preservation, where stringent anonymization methods can significantly impact the text's utility for downstream applications. Evaluating the effectiveness of text anonymization proves challenging from both privacy and utility perspectives, as there is no universal benchmark that can comprehensively assess anonymization techniques across diverse, and sometimes contradictory contexts. We present Tau-Eval, an open-source framework for benchmarking text anonymization methods through the lens of privacy and utility task sensitivity. A Python library, code, documentation and tutorials are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Balancing privacy protection and information preservation in text anonymization
Lack of universal benchmark for evaluating anonymization techniques
Assessing anonymization methods via privacy and utility task sensitivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for text anonymization evaluation
Balances privacy protection with information utility
Open-source Python library with comprehensive resources
🔎 Similar Papers
No similar papers found.