Tau-Eval: A Unified Evaluation Framework for Useful and Private Text Anonymization

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Text anonymization has long struggled to balance privacy preservation with semantic and task utility, further hindered by the absence of a unified, multidimensional, and reproducible evaluation framework. To address this, we propose the first open-source, unified evaluation framework that establishes a privacy–utility dual-sensitivity paradigm and introduces computable, task-specific sensitivity metrics—overcoming the limitations of conventional unidimensional evaluation. Our framework integrates differential privacy verification, semantic fidelity assessment (via BERTScore and natural language inference), downstream task performance degradation (on NER, QA, and summarization), and adversarial robustness testing. We conduct systematic evaluations across 12 state-of-the-art anonymization methods and 5 real-world datasets, achieving significantly improved evaluation consistency (Cohen’s κ = 0.89) and task relevance (Pearson’s r = 0.93). The implementation, benchmarking suite, and comprehensive tutorials are publicly released.

Technology Category

Application Category

📝 Abstract

Text anonymization is the process of removing or obfuscating information from textual data to protect the privacy of individuals. This process inherently involves a complex trade-off between privacy protection and information preservation, where stringent anonymization methods can significantly impact the text's utility for downstream applications. Evaluating the effectiveness of text anonymization proves challenging from both privacy and utility perspectives, as there is no universal benchmark that can comprehensively assess anonymization techniques across diverse, and sometimes contradictory contexts. We present Tau-Eval, an open-source framework for benchmarking text anonymization methods through the lens of privacy and utility task sensitivity. A Python library, code, documentation and tutorials are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Balancing privacy protection and information preservation in text anonymization

Lack of universal benchmark for evaluating anonymization techniques

Assessing anonymization methods via privacy and utility task sensitivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework for text anonymization evaluation

Balances privacy protection with information utility

Open-source Python library with comprehensive resources

🔎 Similar Papers

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization