🤖 AI Summary
This paper investigates the statistical relationship between the number of types and the total number of tokens in complex systems, focusing on the intrinsic connection between Zipf’s law (a power-law rank-frequency distribution) and Heaps’ law (sublinear growth of vocabulary size with text length).
Method: We propose a purely deterministic asymptotic model derived solely from the ranked form of the Zipf distribution, enabling rigorous analytical derivation of the type–token relationship without stochastic sampling or probabilistic assumptions.
Contribution/Results: The model corrects systematic deviations of classical Heaps’ law in boundary regimes—particularly when the Zipf exponent α = 1 or α ≫ 1—and provides, for the first time, an exact asymptotic characterization of finite-system type growth directly from Zipf’s law alone. Our results demonstrate that Zipf’s law is sufficient to fully determine Heapsian scaling behavior, establishing a unified deterministic theoretical foundation for scaling laws across linguistic, ecological, urban, and other complex systems.
📝 Abstract
The growth dynamics of complex systems often exhibit statistical regularities involving power-law relationships. For real finite complex systems formed by countable tokens (animals, words) as instances of distinct types (species, dictionary entries), an inverse power-law scaling $S sim r^{-alpha}$ between type count $S$ and type rank $r$, widely known as Zipf's law, is widely observed to varying degrees of fidelity. A secondary, summary relationship is Heaps'law, which states that the number of types scales sublinearly with the total number of observed tokens present in a growing system. Here, we propose an idealized model of a growing system that (1) deterministically produces arbitrary inverse power-law count rankings for types, and (2) allows us to determine the exact asymptotics of the type-token relationship. Our argument improves upon and remedies earlier work. We obtain a unified asymptotic expression for all values of $alpha$, which corrects the special cases of $alpha = 1$ and $alpha gg 1$. Our approach relies solely on the form of count rankings, avoids unnecessary approximations, and does not involve any stochastic mechanisms or sampling processes. We thereby demonstrate that a general type-token relationship arises solely as a consequence of Zipf's law.