Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods

📅 2024-06-21
🏛️ Journal of Artificial Intelligence Research
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical AI governance challenge of detecting AI-generated text (AIGT). We systematically survey state-of-the-art detection approaches—including watermarking, statistical feature analysis, stylistic modeling, and supervised/zero-shot machine learning classifiers—and catalog major benchmark datasets. We introduce, for the first time, a multidimensional framework characterizing AIGT detectability, incorporating factors such as LLM architecture, text length, and domain alignment. Further, we propose a practical, deployment-oriented detectability evaluation framework accompanied by actionable guidelines. Empirical analysis reveals fundamental performance limits and robustness bottlenecks of existing methods under cross-model and cross-domain settings. Our findings provide theoretical foundations and an implementable technical roadmap for applications in fraud detection, academic integrity assurance, and disinformation mitigation.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have advanced to a point that even humans have difficulty discerning whether a text was generated by another human, or by a computer. However, knowing whether a text was produced by human or artificial intelligence (AI) is important to determining its trustworthiness, and has applications in many domains including detecting fraud and academic dishonesty, as well as combating the spread of misinformation and political propaganda. The task of AI-generated text (AIGT) detection is therefore both very challenging, and highly critical. In this survey, we summarize stateof-the art approaches to AIGT detection, including watermarking, statistical and stylistic analysis, and machine learning classification. We also provide information about existing datasets for this task. Synthesizing the research findings, we aim to provide insight into the salient factors that combine to determine how “detectable” AIGT text is under different scenarios, and to make practical recommendations for future work towards this significant technical and societal challenge.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated text using current methods
Assessing trustworthiness of human vs AI-written content
Identifying factors influencing detectability in different scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Watermarking for AI text detection
Statistical and stylistic analysis methods
Machine learning classification techniques
🔎 Similar Papers
No similar papers found.