Phonological distances for linguistic typology and the origin of Indo-European languages

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This study proposes a phonological distance metric grounded in second-order Markov chains and information theory, incorporating articulatory feature weighting to quantify genetic relatedness and contact-induced influence across 67 modern Indo-European languages using parallel spoken corpora. By explicitly modeling short-range phonotactic dependencies, the method effectively uncovers large-scale phylogenetic structure and geographic contact signals in phonological systems. The experimental results not only accurately reproduce established genealogical groupings within the Indo-European family but also reveal a statistically significant correlation between phonological and geographic distances. These findings provide novel computational linguistic evidence supporting the Steppe hypothesis regarding the origin of the Indo-European language family.

Technology Category

Application Category

📝 Abstract
We show that short-range phoneme dependencies encode large-scale patterns of linguistic relatedness, with direct implications for quantitative typology and evolutionary linguistics. Specifically, using an information-theoretic framework, we argue that phoneme sequences modeled as second-order Markov chains essentially capture the statistical correlations of a phonological system. This finding enables us to quantify distances among 67 modern languages from a multilingual parallel corpus employing a distance metric that incorporates articulatory features of phonemes. The resulting phonological distance matrix recovers major language families and reveals signatures of contact-induced convergence. Remarkably, we obtain a clear correlation with geographic distance, allowing us to constrain a plausible homeland region for the Indo-European family, consistent with the Steppe hypothesis.
Problem

Research questions and friction points this paper is trying to address.

phonological distance
linguistic typology
Indo-European origin
language relatedness
contact-induced convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

phonological distance
second-order Markov chains
information-theoretic framework
articulatory features
Indo-European homeland
🔎 Similar Papers
No similar papers found.