On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the fundamental challenge in LLM-generated text detection: the absence of a rigorous, universally accepted definition of “LLM-generated text,” compounded by heterogeneous models, diverse application contexts, human editing, and blurred human-AI collaboration boundaries—rendering the actual detection target an incomplete subset of raw LLM outputs. Adopting conceptual analysis and critical evaluation, the study deconstructs idealized assumptions underlying mainstream benchmarks and evaluation paradigms, exposing severe real-world biases arising from their neglect of editing provenance, prompt engineering, and model evolution. Key contributions are: (1) a redefined detection framework emphasizing contextual sensitivity and process traceability; (2) delineation of the effective operational boundaries of detection techniques; and (3) a methodological principle positioning detection outputs as auxiliary references—not definitive determinations—to support trustworthy AI content governance.

Technology Category

Application Category

📝 Abstract
With the widespread use of large language models (LLMs), many researchers have turned their attention to detecting text generated by them. However, there is no consistent or precise definition of their target, namely"LLM-generated text". Differences in usage scenarios and the diversity of LLMs further increase the difficulty of detection. What is commonly regarded as the detecting target usually represents only a subset of the text that LLMs can potentially produce. Human edits to LLM outputs, together with the subtle influences that LLMs exert on their users, are blurring the line between LLM-generated and human-written text. Existing benchmarks and evaluation approaches do not adequately address the various conditions in real-world detector applications. Hence, the numerical results of detectors are often misunderstood, and their significance is diminishing. Therefore, detectors remain useful under specific conditions, but their results should be interpreted only as references rather than decisive indicators.
Problem

Research questions and friction points this paper is trying to address.

Lack of consistent definition for LLM-generated text
Difficulty detecting edited or influenced LLM outputs
Inadequate evaluation of real-world detector conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes redefining LLM-generated text detection
Addresses human edits and LLM influences
Recommends interpreting detector results as references
🔎 Similar Papers
No similar papers found.
Mingmeng Geng
Mingmeng Geng
Postdoc, ENS-PSL
large language modelscomputational social sciencescience of sciencesurvey methodology
T
T. Poibeau
École Normale Supérieure (ENS) - Université Paris Sciences et Lettres (PSL), Laboratoire Lattice (CNRS, ENS-PSL, Université Sorbonne Nouvelle)