Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans

📅 2023-08-14

🏛️ Machine Learning with Applications

📈 Citations: 2

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This study investigates the impact of generative AI—exemplified by ChatGPT—on lexical usage and language evolution. Methodologically, it conducts a systematic, cross-domain comparison between ChatGPT-generated and human-produced texts across lexical breadth, frequency distribution, and diversity, employing standardized metrics including MTLD, HD-D, and VOCD to quantify surface-level lexical richness for the first time. Results reveal that while ChatGPT achieves high lexical coverage, its word frequency distribution is markedly uniform, lacking the bursty occurrence of low-frequency words characteristic of human speech and writing; consequently, its overall lexical diversity is significantly lower than that observed in human corpora. These findings uncover a fundamental divergence in lexical production mechanisms between large language models and humans, providing empirical grounding and methodological innovation for studying AI-driven language change.

Problem

Research questions and friction points this paper is trying to address.

Compares vocabulary and lexical diversity between ChatGPT and humans.

Investigates impact of AI-generated text on language evolution and reader capabilities.

Assesses whether ChatGPT reduces vocabulary richness in generated content.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares ChatGPT and human vocabulary diversity

Analyzes lexical richness in AI-generated text

Uses datasets for ChatGPT and human responses

🔎 Similar Papers

Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool