Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool

📅 2024-07-04
🏛️ Information
📈 Citations: 12
Influential: 1
📄 PDF

career value

192K/year
🤖 AI Summary
This study investigates systematic linguistic disparities between AI-generated and human-written texts across multiple hierarchical levels. Method: Using human-authored argumentative essays as a benchmark, we compare length-matched ChatGPT outputs and automatically extract phonological (e.g., consonant types), morphological, syntactic (e.g., adjective/prepositional modifiers), and lexical features (e.g., nouns, adjectives, pronouns, low-frequency words) via the Open Brain AI platform, followed by comparative statistical analysis. Contribution/Results: We introduce the first automated, multi-level linguistic assessment paradigm grounded in publicly accessible computational tools. Results reveal significant deviations in AI text—particularly in part-of-speech distributions, modifier structural complexity, and usage of low-frequency vocabulary—relative to human norms. The paradigm demonstrates high efficacy for authorship attribution and provides reproducible, scalable empirical foundations for both AI-text detection and generative model refinement.

Technology Category

Application Category

📝 Abstract
While extensive research has focused on ChatGPT in recent years, very few studies have systematically quantified and compared linguistic features between human-written and artificial intelligence (AI)-generated language. This exploratory study aims to investigate how various linguistic components are represented in both types of texts, assessing AI’s ability to emulate human writing. Using human-authored essays as a benchmark, we prompted ChatGPT to generate essays of equivalent length. These texts were analyzed using Open Brain AI, an online computational tool, to extract measures of phonological, morphological, syntactic, and lexical constituents. Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features such as specific types of consonants, nouns, adjectives, pronouns, adjectival/prepositional modifiers, and use of difficult words, among others. These findings underscore the importance of integrating automated tools for efficient language assessment, reducing time and effort in data analysis. Moreover, they emphasize the necessity for enhanced training methodologies to improve AI’s engineering capacity for producing more human-like text.
Problem

Research questions and friction points this paper is trying to address.

Differentiating human-written and AI-generated texts using linguistic features
Quantifying linguistic differences in phonological, morphological, syntactic components
Assessing AI's ability to emulate human writing across multiple features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated linguistic feature extraction tool
Comparative analysis of human and AI texts
Identified significant linguistic differences
🔎 Similar Papers
2024-06-21Journal of Artificial Intelligence ResearchCitations: 6