Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dual role of large language models (LLMs) in authorship verification—both enabling and undermining its reliability—by systematically evaluating the adversarial robustness of such systems. We propose two novel LLM-driven attack frameworks: (1) untargeted style masking, which conceals the true author while preserving semantics, and (2) targeted style imitation, which impersonates a specified author under strict semantic fidelity constraints. Our approach integrates controllable text rewriting, style-transfer prompting, and white-box/gray-box perturbation strategies. Evaluated on standard authorship verification models, the masking attack achieves a 92% success rate, and the imitation attack reaches 78%, substantially outperforming conventional text perturbation baselines. This study provides the first systematic characterization of critical vulnerability boundaries for authorship verification in the LLM era, establishing both theoretical foundations and practical methodologies for developing robust, trustworthy AI-assisted content provenance systems.

Technology Category

Application Category

📝 Abstract
The increasing use of Artificial Intelligence (AI) technologies, such as Large Language Models (LLMs) has led to nontrivial improvements in various tasks, including accurate authorship identification of documents. However, while LLMs improve such defense techniques, they also simultaneously provide a vehicle for malicious actors to launch new attack vectors. To combat this security risk, we evaluate the adversarial robustness of authorship models (specifically an authorship verification model) to potent LLM-based attacks. These attacks include untargeted methods - extit{authorship obfuscation} and targeted methods - extit{authorship impersonation}. For both attacks, the objective is to mask or mimic the writing style of an author while preserving the original texts' semantics, respectively. Thus, we perturb an accurate authorship verification model, and achieve maximum attack success rates of 92% and 78% for both obfuscation and impersonation attacks, respectively.
Problem

Research questions and friction points this paper is trying to address.

Evaluating adversarial robustness of authorship verification models
Testing LLM-based obfuscation and impersonation attack methods
Measuring attack success rates on authorship verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates adversarial robustness of authorship models
Uses LLM-based obfuscation and impersonation attacks
Achieves high attack success rates up to 92%
🔎 Similar Papers
No similar papers found.