Linguistic Structure from a Bottleneck on Sequential Information Processing

📅 2024-05-20
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
This study investigates how the systematic hierarchical structure of human language—e.g., phonological, morphological, syntactic, and semantic compositionality—emerges from predictive information (excess entropy), a statistical complexity constraint rooted in efficient communication under cognitive processing bottlenecks. Method: We propose that minimization of predictive information serves as a unifying cognitive mechanism driving hierarchical organization across linguistic levels. Leveraging theoretical analysis, sequence modeling, factorization simulations, and cross-linguistic corpus statistics (spanning dozens of natural languages), we examine predictive information reduction at each structural level. Contribution/Results: We establish predictive information as the first unified explanatory framework for multi-level linguistic structure and formally link it to independent component analysis. Empirical analyses robustly confirm significant predictive information reduction across phonology, morphology, syntax, and semantics in diverse languages. These findings support the hypothesis that cognitive constraints shape linguistic structure and provide a novel information-theoretic foundation for language origin and evolution.

Technology Category

Application Category

📝 Abstract
Human language is a unique form of communication in the natural world, distinguished by its structured nature. Most fundamentally, it is systematic, meaning that signals can be broken down into component parts that are individually meaningful -- roughly, words -- which are combined in a regular way to form sentences. Furthermore, the way in which these parts are combined maintains a kind of locality: words are usually concatenated together, and they form contiguous phrases, keeping related parts of sentences close to each other. We address the challenge of understanding how these basic properties of language arise from broader principles of efficient communication under information processing constraints. Here we show that natural-language-like systematicity arises in codes that are constrained by predictive information, a measure of the amount of information that must be extracted from the past of a sequence in order to predict its future. In simulations, we show that such codes approximately factorize their source distributions, and then express the resulting factors systematically and locally. Next, in a series of cross-linguistic corpus studies, we show that human languages are structured to have low predictive information at the levels of phonology, morphology, syntax, and semantics. Our result suggests that human language performs a sequential, discrete form of Independent Components Analysis on the statistical distribution over meanings that need to be expressed. It establishes a link between the statistical and algebraic structure of human language, and reinforces the idea that the structure of human language is shaped by communication under cognitive constraints.
Problem

Research questions and friction points this paper is trying to address.

Explaining how linguistic systematic structure emerges from cognitive constraints
Demonstrating predictive information reduction shapes human language organization
Linking statistical complexity measures to algebraic language structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using predictive information to model language structure
Breaking messages into independent feature groups
Reducing predictive information across linguistic levels
🔎 Similar Papers
No similar papers found.