SilverSpeak: Evading AI-Generated Text Detectors using Homoglyphs

📅 2024-06-17

📈 Citations: 2

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study addresses the vulnerability of AI text detectors to character-level representations. We propose and empirically validate the first systematic black-box adversarial attack leveraging Unicode homoglyphs—visually identical yet semantically distinct characters. By applying fine-grained, semantics-preserving homoglyph substitutions to AI-generated text, our method degrades detector performance without compromising readability or meaning, reducing the Matthews Correlation Coefficient (MCC) from 0.64 to −0.01. The attack achieves near-total evasion across seven state-of-the-art detectors—including OpenAI’s official classifier and watermark-based detectors—and five diverse, cross-domain datasets, with average MCC approaching zero. Rigorous robustness evaluation and internal behavior attribution confirm strong generalization across models and datasets. Our work exposes a fundamental flaw in current detection paradigms: overreliance on superficial character-level signals. It provides critical insights and empirical evidence for developing more robust, semantics-aware AI content authentication mechanisms.

Technology Category

Application Category

📝 Abstract

The advent of Large Language Models (LLMs) has enabled the generation of text that increasingly exhibits human-like characteristics. As the detection of such content is of significant importance, substantial research has been conducted with the objective of developing reliable AI-generated text detectors. These detectors have demonstrated promising results on test data, but recent research has revealed that they can be circumvented by employing different techniques. In this paper, we present homoglyph-based attacks (A $ ightarrow$ Cyrillic A) as a means of circumventing existing detectors. We conduct a comprehensive evaluation to assess the effectiveness of these attacks on seven detectors, including ArguGPT, Binoculars, DetectGPT, Fast-DetectGPT, Ghostbuster, OpenAI's detector, and watermarking techniques, on five different datasets. Our findings demonstrate that homoglyph-based attacks can effectively circumvent state-of-the-art detectors, leading them to classify all texts as either AI-generated or human-written (decreasing the average Matthews Correlation Coefficient from 0.64 to -0.01). Through further examination, we extract the technical justification underlying the success of the attacks, which varies across detectors. Finally, we discuss the implications of these findings and potential defenses against such attacks.

Problem

Research questions and friction points this paper is trying to address.

AI Text Detection

Text Camouflage

Robot-generated Text

Innovation

Methods, ideas, or system contributions that make the work stand out.

SilverSpeak

AI Text Detectors

Detection Evasion

🔎 Similar Papers

Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods