Beating the Style Detector: Three Hours of Agentic Research on the AI-Text Arms Race

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) can evade AI text detectors through stylistic post-editing and examines performance differences between human and machine editors. Leveraging an agent-driven experimental framework, the work reproduces and extends an ACL study within three hours, providing the first controlled evidence that GPT-5.5 and Claude Opus 4.7 substantially narrow the stylistic gap with human authors—achieving 71–75% of the human stylistic ceiling across 324 tasks, outperforming manual post-editing. After adversarial training, Opus enters the human stylistic half-space in 2 out of 5 samples, reducing detection probability by an order of magnitude. The findings reveal that current detectors over-rely on spurious features such as text length. Using LUAR-MUD embeddings, leave-one-author-out cross-validated SVMs, and multidimensional diagnostic analyses, the study successfully replicates all seven pre-registered hypotheses (r = +0.244).

📝 Abstract

Reproducing an empirical NLP study used to take weeks. Given the released data and a modern agentic-research harness, we redo every experiment of a recent ACL\,2026 study on personal-style post-editing of LLM drafts -- and add three new ones -- with the human investigator acting only as a reviewer-in-the-loop. We reproduce all seven preregistered hypotheses and recover the paper's headline correlation between perceived self-similarity and embedding-measured self-similarity to three decimal places ($r{=}{+}0.244$, $p{<}10^{-8}$, $n{=}648$). Under a leakage-free held-out protocol, GPT-5.5 and Claude\,Opus\,4.7 close $71$--$75\,\%$ of the style gap to the same-author ceiling on $324$ paired tasks, against $24\,\%$ for the human post-edit, and beat the human post-edit on $\sim$$80\,\%$ of tasks. We then frame the same data as an AI-text detection arms race. A leave-authors-out linear SVM on LUAR-MUD embeddings reaches AUC $0.93$--$1.00$ across approaches; six diagnostics show that GPT-5.5 detection is mostly a length confound while Opus detection is a genuine stylistic signature. Given $T{=}20$ feedback iterations against the frozen detector, an Opus agent flips two of five held-out test mimics to the human half-space and shrinks every margin by an order of magnitude. With moderate effort against a known detector, a frontier LLM can already efficiently lower its own AI-detection probability. All code, $648$ mimic drafts, trained detectors, diagnostics, and adversarial trajectories are released.

Problem

Research questions and friction points this paper is trying to address.

AI-text detection

style mimicry

adversarial text generation

LLM post-editing

stylistic signature

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic research

AI-text detection arms race

style mimicry

adversarial fine-tuning

LUAR-MUD embeddings

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

2024-06-21Journal of Artificial Intelligence ResearchCitations: 6