MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited robustness of current AI-generated text detectors under black-box adversarial settings, where most existing attack methods either require white-box access or incur high computational costs. To overcome these limitations, the authors propose MASH, a novel framework that, for the first time, integrates supervised fine-tuning (to inject human-like writing styles), Direct Preference Optimization (DPO), and inference-time refinement within a black-box setting to enable efficient, low-cost, and highly effective style-based adversarial attacks. Experimental results across six datasets and five state-of-the-art detectors demonstrate that MASH achieves an average attack success rate of 92%, outperforming the strongest baseline by 24 percentage points while preserving high linguistic quality.

Technology Category

Application Category

📝 Abstract

The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders. Specifically, MASH achieves an average Attack Success Rate (ASR) of 92%, surpassing the strongest baselines by an average of 24%, while maintaining superior linguistic quality.

Problem

Research questions and friction points this paper is trying to address.

AI-generated text detection

black-box evasion

adversarial attack

style transfer

text humanization

Innovation

Methods, ideas, or system contributions that make the work stand out.

style transfer

black-box evasion

AI-generated text detection

human-like text generation

adversarial attack

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

2024-06-21Journal of Artificial Intelligence ResearchCitations: 6

Authors to Follow