AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing anonymization methods for user-generated text suffer from either significant utility loss or reliance on cloud-based large language models (LLMs), introducing new privacy risks. Method: We propose AgentStealth, a lightweight, fully local anonymization framework that integrates context-aware contrastive learning, utility-aware control, and online adversarial reinforcement learning—enabling continuous model refinement via internal adversarial feedback. The method trains a small language model (SLM) via supervised fine-tuning on high-quality anonymized and adversarial signal data, eliminating cloud dependency. Contribution/Results: Evaluated on two benchmark datasets, AgentStealth achieves a 12.3% improvement in anonymization accuracy and a 6.8% gain in semantic fidelity over baselines. Its compact parameter count enables efficient deployment on edge devices. The implementation is open-sourced to support reproducibility and extensibility.

Technology Category

Application Category

📝 Abstract
In today's digital world, casual user-generated content often contains subtle cues that may inadvertently expose sensitive personal attributes. Such risks underscore the growing importance of effective text anonymization to safeguard individual privacy. However, existing methods either rely on rigid replacements that damage utility or cloud-based LLMs that are costly and pose privacy risks. To address these issues, we explore the use of locally deployed smaller-scale language models (SLMs) for anonymization. Yet training effective SLMs remains challenging due to limited high-quality supervision. To address the challenge, we propose AgentStealth, a self-reinforcing LLM anonymization framework.First, we introduce an adversarial anonymization workflow enhanced by In-context Contrastive Learning and Adaptive Utility-Aware Control. Second, we perform supervised adaptation of SLMs using high-quality data collected from the workflow, which includes both anonymization and attack signals. Finally, we apply online reinforcement learning where the model leverages its internal adversarial feedback to iteratively improve anonymization performance. Experiments on two datasets show that our method outperforms baselines in both anonymization effectiveness (+12.3%) and utility (+6.8%). Our lightweight design supports direct deployment on edge devices, avoiding cloud reliance and communication-based privacy risks. Our code is open-source at https://github.com/tsinghua-fib-lab/AgentStealth.
Problem

Research questions and friction points this paper is trying to address.

Anonymizing user text to protect sensitive personal attributes
Overcoming rigid replacements and cloud-based LLM privacy risks
Training effective small-scale models with limited supervision data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Locally deployed small-scale language models for anonymization
Adversarial anonymization with contrastive learning and adaptive control
Online reinforcement learning using internal adversarial feedback
🔎 Similar Papers
No similar papers found.