Rethinking the Relationship between the Power Law and Hierarchical Structures

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study empirically tests the widely held hypothesis that power-law decay in natural language constitutes evidence for syntactic hierarchical structure. Method: Leveraging English dependency and phrase-structure trees, we conduct mutual information estimation, PCFG-fitting deviation analysis, and rigorous statistical significance testing—the first systematic empirical evaluation of the core assumption linking power laws to hierarchy. Contribution/Results: We find that syntactic trees systematically violate this assumption: power-law decay is neither a necessary nor sufficient consequence of hierarchical structure, and cross-domain generalizations—e.g., to child language or animal communication—are statistically unsupported. Our results refute the theoretical convention equating power laws with syntactic hierarchy, exposing fundamental limitations in their explanatory power. The study establishes a more rigorous statistical benchmark for modeling linguistic hierarchy and calls for foundational revision of related theoretical frameworks.

Technology Category

Application Category

📝 Abstract
Statistical analysis of corpora provides an approach to quantitatively investigate natural languages. This approach has revealed that several power laws consistently emerge across different corpora and languages, suggesting the universal principles underlying languages. Particularly, the power-law decay of correlation has been interpreted as evidence for underlying hierarchical structures in syntax, semantics, and discourse. This perspective has also been extended to child languages and animal signals. However, the argument supporting this interpretation has not been empirically tested. To address this problem, this study examines the validity of the argument for syntactic structures. Specifically, we test whether the statistical properties of parse trees align with the implicit assumptions in the argument. Using English corpora, we analyze the mutual information, deviations from probabilistic context-free grammars (PCFGs), and other properties in parse trees, as well as in the PCFG that approximates these trees. Our results indicate that the assumptions do not hold for syntactic structures and that it is difficult to apply the proposed argument to child languages and animal signals, highlighting the need to reconsider the relationship between the power law and hierarchical structures.
Problem

Research questions and friction points this paper is trying to address.

Tests if power-law decay implies syntactic hierarchy validity
Examines parse tree stats vs. probabilistic grammar assumptions
Challenges power-law link to child and animal language structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyze mutual information in parse trees
Test deviations from probabilistic context-free grammars
Re-examine power law and hierarchical structures link
🔎 Similar Papers
No similar papers found.
K
Kai Nakaishi
RIKEN, Japan
R
Ryosuke Yoshida
The University of Tokyo, Japan
K
Kohei Kajikawa
National Institute for Japanese Language and Linguistics, JAPAN
K
Koji Hukushima
The University of Tokyo, Japan
Yohei Oseki
Yohei Oseki
University of Tokyo
Computational LinguisticsCognitive Science