Learning to Rewrite: Generalized LLM-Generated Text Detection

📅 2024-08-08

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing AI-generated text detectors exhibit poor generalization in open-world settings, struggling to identify unseen domains and non-factual content. Method: We propose “Learning to Rewrite”—a novel paradigm that eschews direct classification and instead leverages large language models (LLMs) to perform rewriting: minimal edits for AI-generated texts versus substantial edits for human-written texts, thereby generating a generalizable edit-distance-based discrimination signal. Our approach employs multi-LLM comparative rewriting (GPT-3.5/4, Gemini, Llama-3), edit-distance distillation, and unsupervised consistency optimization—requiring no human annotations or model fine-tuning. Contribution/Results: The method achieves zero-shot out-of-distribution generalization and strong adversarial robustness. Evaluated across 21 independent domains, it improves AUROC over state-of-the-art methods by up to 48.66% under adversarial settings, significantly outperforming supervised baselines while maintaining superior parameter efficiency.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) present significant risks when used to generate non-factual content and spread disinformation at scale. Detecting such LLM-generated content is crucial, yet current detectors often struggle to generalize in open-world contexts. We introduce Learning2Rewrite, a novel framework for detecting AI-generated text with exceptional generalization to unseen domains. Our method leverages the insight that LLMs inherently modify AI-generated content less than human-written text when tasked with rewriting. By training LLMs to minimize alterations on AI-generated inputs, we amplify this disparity, yielding a more distinguishable and generalizable edit distance across diverse text distributions. Extensive experiments on data from 21 independent domains and four major LLMs (GPT-3.5, GPT-4, Gemini, and Llama-3) demonstrate that our detector outperforms state-of-the-art detection methods by up to 23.04% in AUROC for in-distribution tests, 37.26% for out-of-distribution tests, and 48.66% under adversarial attacks. Our unique training objective ensures better generalizability compared to directly training for classification, when leveraging the same amount of parameters. Our findings suggest that reinforcing LLMs' inherent rewriting tendencies offers a robust and scalable solution for detecting AI-generated text.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated text accurately

Improving generalization in detection methods

Minimizing alterations in AI-generated content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training LLMs for minimal text alterations

Amplifying edit distance for detection

Enhancing generalizability across diverse domains

🔎 Similar Papers

No similar papers found.