ChatGPT-generated texts show authorship traits that identify them as non-human

📅 2025-08-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study investigates whether large language model (LLM)-generated text exhibits detectable authorial fingerprints distinguishable from human writing. To address this, we apply stylometric analysis and multidimensional register analysis to quantitatively model and compare cross-register texts. Our findings reveal systematic lexical preferences in LLM outputs—particularly in nominal usage—while core grammatical dimensions (e.g., tense, aspect, mood) exhibit constrained variation, reflecting an inability to replicate human-like grammatical anchoring. Although LLMs demonstrate strong register adaptation, their underlying linguistic patterns manifest cross-register stability and systematic deviation. Crucially, human-level grammatical complexity emerges as a fundamental bottleneck in AI language generation. This limitation serves as a robust linguistic litmus test for detecting non-human text, enabling a novel, grammar-deep discrimination pathway for AI content detection.

Technology Category

Application Category

📝 Abstract

Large Language Models can emulate different writing styles, ranging from composing poetry that appears indistinguishable from that of famous poets to using slang that can convince people that they are chatting with a human online. While differences in style may not always be visible to the untrained eye, we can generally distinguish the writing of different people, like a linguistic fingerprint. This work examines whether a language model can also be linked to a specific fingerprint. Through stylometric and multidimensional register analyses, we compare human-authored and model-authored texts from different registers. We find that the model can successfully adapt its style depending on whether it is prompted to produce a Wikipedia entry vs. a college essay, but not in a way that makes it indistinguishable from humans. Concretely, the model shows more limited variation when producing outputs in different registers. Our results suggest that the model prefers nouns to verbs, thus showing a distinct linguistic backbone from humans, who tend to anchor language in the highly grammaticalized dimensions of tense, aspect, and mood. It is possible that the more complex domains of grammar reflect a mode of thought unique to humans, thus acting as a litmus test for Artificial Intelligence.

Problem

Research questions and friction points this paper is trying to address.

Identifying authorship traits distinguishing AI-generated texts from human writing

Examining linguistic fingerprints of Large Language Models across registers

Assessing AI's ability to emulate human grammatical complexity and variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stylometric analysis for authorship identification

Multidimensional register analysis comparing texts

Detecting limited linguistic variation in AI

🔎 Similar Papers

Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool