Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Models

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of large language models (LLMs) in evaluating textual privacy sensitivity, which hinders their scalable deployment on sensitive data. To overcome this limitation, the authors propose a knowledge distillation approach that transfers the privacy judgment capability of Mistral Large 3 into a lightweight encoder model with only 150 million parameters. The distilled model is trained on a large-scale, multi-domain dataset with privacy annotations and demonstrates strong alignment with human judgments on a held-out test set. It achieves substantial reductions in computational overhead while maintaining high fidelity in privacy assessments. The effectiveness and practicality of the resulting model are further validated through its successful integration into a de-identification system for evaluating privacy-preserving performance.
📝 Abstract
Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy evaluators, achieving strong agreement with human judgments; however, their computational cost and impracticality for processing sensitive data at scale limit real-world deployment. We address this gap by distilling the privacy assessment capabilities of Mistral Large 3 (675B) into lightweight encoder models with as few as 150M parameters. Leveraging a large-scale dataset of privacy-annotated texts spanning 10 diverse domains, we train efficient classifiers that preserve strong agreement with human annotations while dramatically reducing computational requirements. We validate our approach on human-annotated test data and demonstrate its practical utility as an evaluation metric for de-identification systems.
Problem

Research questions and friction points this paper is trying to address.

privacy sensitivity assessment
large language models
human-aligned evaluation
privacy-preserving NLP
text de-identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge distillation
privacy sensitivity assessment
large language models
lightweight encoder
de-identification evaluation
🔎 Similar Papers
No similar papers found.