🤖 AI Summary
This study investigates whether Heaps’ law—governing lexical growth in discourse—is modulated by part-of-speech (POS) categories in natural dialogue (video-recorded conversations between strangers) versus fictional dialogue (scripted film/TV dialogue). Using cross-medium corpus collection, fine-grained POS tagging, and POS-specific Heaps’ law fitting, we conduct the first systematic comparison of vocabulary expansion rates for core word classes (e.g., nouns, verbs) across these discourse types. Results reveal that nouns expand significantly faster in fictional dialogue, whereas verbs exhibit greater growth advantage in natural dialogue—demonstrating that discourse type (interactive vs. narrative) exerts a deep, functional modulation on statistical linguistic regularities. This work bridges statistical linguistics with pragmatics and narratological linguistics, providing empirical grounding and a novel analytical framework for understanding how communicative function shapes macro-level language statistics.
📝 Abstract
Conversation is a cornerstone of social connection and is linked to well-being outcomes. Conversations vary widely in type with some portion generating complex, dynamic stories. One approach to studying how conversations unfold in time is through statistical patterns such as Heaps' law, which holds that vocabulary size scales with document length. Little work on Heaps's law has looked at conversation and considered how language features impact scaling. We measure Heaps' law for conversations recorded in two distinct mediums: 1. Strangers brought together on video chat and 2. Fictional characters in movies. We find that scaling of vocabulary size differs by parts of speech. We discuss these findings through behavioral and linguistic frameworks.