Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship Verification

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the dual role of large language models (LLMs) in authorship verification—both enabling and undermining its reliability—by systematically evaluating the adversarial robustness of such systems. We propose two novel LLM-driven attack frameworks: (1) untargeted style masking, which conceals the true author while preserving semantics, and (2) targeted style imitation, which impersonates a specified author under strict semantic fidelity constraints. Our approach integrates controllable text rewriting, style-transfer prompting, and white-box/gray-box perturbation strategies. Evaluated on standard authorship verification models, the masking attack achieves a 92% success rate, and the imitation attack reaches 78%, substantially outperforming conventional text perturbation baselines. This study provides the first systematic characterization of critical vulnerability boundaries for authorship verification in the LLM era, establishing both theoretical foundations and practical methodologies for developing robust, trustworthy AI-assisted content provenance systems.

Technology Category

Application Category

📝 Abstract

The increasing use of Artificial Intelligence (AI) technologies, such as Large Language Models (LLMs) has led to nontrivial improvements in various tasks, including accurate authorship identification of documents. However, while LLMs improve such defense techniques, they also simultaneously provide a vehicle for malicious actors to launch new attack vectors. To combat this security risk, we evaluate the adversarial robustness of authorship models (specifically an authorship verification model) to potent LLM-based attacks. These attacks include untargeted methods - extit{authorship obfuscation} and targeted methods - extit{authorship impersonation}. For both attacks, the objective is to mask or mimic the writing style of an author while preserving the original texts' semantics, respectively. Thus, we perturb an accurate authorship verification model, and achieve maximum attack success rates of 92% and 78% for both obfuscation and impersonation attacks, respectively.

Problem

Research questions and friction points this paper is trying to address.

Evaluating adversarial robustness of authorship verification models

Testing LLM-based obfuscation and impersonation attack methods

Measuring attack success rates on authorship verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates adversarial robustness of authorship models

Uses LLM-based obfuscation and impersonation attacks

Achieves high attack success rates up to 92%

🔎 Similar Papers

Discovering Spoofing Attempts on Language Model Watermarks