Mask-Free Privacy Extraction and Rewriting: A Domain-Aware Approach via Prototype Learning

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the longstanding challenge in privacy-preserving text rewriting: balancing stringent privacy guarantees with the preservation of semantic utility. Existing approaches either disrupt contextual coherence through full-text modification, rely on manual masking or static dictionaries for fragment-based redaction, or suffer from unstable instruction-following behavior in large language models, leading to either privacy leakage or excessive over-redaction. To overcome these limitations, we propose DAMPER, a novel framework that introduces domain-specific privacy prototypes for the first time. By leveraging contrastive learning, DAMPER implicitly distills compact privacy-sensitive semantic representations, enabling precise, mask-free privacy localization. Furthermore, it integrates prototype-guided preference alignment with the exponential mechanism to achieve provable differential privacy without requiring human annotations. Experiments across multiple privacy-sensitive domains demonstrate that DAMPER significantly outperforms state-of-the-art methods, effectively safeguarding privacy while maintaining high semantic fidelity and contextual utility.

Technology Category

Application Category

📝 Abstract
Client-side privacy rewriting is crucial for deploying LLMs in privacy-sensitive domains. However, existing approaches struggle to balance privacy and utility. Full-text methods often distort context, while span-level approaches rely on impractical manual masks or brittle static dictionaries. Attempts to automate localization via prompt-based LLMs prove unreliable, as they suffer from unstable instruction following that leads to privacy leakage and excessive context scrubbing. To address these limitations, we propose DAMPER (Domain-Aware Mask-free Privacy Extraction and Rewriting). DAMPER operationalizes latent privacy semantics into compact Domain Privacy Prototypes via contrastive learning, enabling precise, autonomous span localization. Furthermore, we introduce a Prototype-Guided Preference Alignment, which leverages learned prototypes as semantic anchors to construct preference pairs, optimizing a domain-compliant rewriting policy without human annotations. At inference time, DAMPER integrates a sampling-based Exponential Mechanism to provide rigorous span-level Differential Privacy (DP) guarantees. Extensive experiments demonstrate that DAMPER significantly outperforms existing baselines, achieving a superior privacy-utility trade-off.
Problem

Research questions and friction points this paper is trying to address.

privacy rewriting
privacy-utility trade-off
span-level privacy
client-side privacy
differential privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mask-Free Privacy Rewriting
Domain Privacy Prototypes
Prototype-Guided Preference Alignment
Contrastive Learning
Differential Privacy
X
Xiaodong Li
Center for the Applied Statistics, School of Statistics, Renmin University of China; Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing; School of Artificial Intelligence, Beihang University, China
Yuhua Wang
Yuhua Wang
Ford Foundation Professor of Modern China Studies at Harvard University
Political Science
Q
Qingchen Yu
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing; School of Artificial Intelligence, Beihang University, China
Z
Zixuan Qin
Center for the Applied Statistics, School of Statistics, Renmin University of China
Y
Yifan Sun
Center for the Applied Statistics, School of Statistics, Renmin University of China; Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing
Q
Qinnan Zhang
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing; School of Artificial Intelligence, Beihang University, China
Hainan Zhang
Hainan Zhang
Beihang University
Dialogue GenerationText GenerationFederated LearningNatural Language Processing
Z
Zhiming Zheng
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing; School of Artificial Intelligence, Beihang University, China