Using Information Theory to Characterize Prosodic Typology: The Case of Tone, Pitch-Accent and Stress-Accent

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This study quantifies the statistical dependence between lexical identity and prosody—particularly pitch—to uncover fundamental differences in how tone, pitch-accent, and stress languages distinguish lexical items. Method: For the first time, mutual information is formalized as a continuous typological metric. Using a multilingual speech corpus, the approach integrates text–pitch curve alignment, entropy and mutual information estimation, and statistical modeling. Contribution/Results: Tone languages exhibit significantly higher lexical–pitch mutual information than pitch-accent or stress languages, while pitch entropy remains comparable across types—indicating that prosodic encoding efficiency stems from pitch’s predictive power over lexical meaning, not its inherent variability. These findings challenge traditional discrete language-type classifications and provide information-theoretic evidence for a gradient prosodic typology. The work advances linguistic typology toward a quantitative, computationally grounded paradigm by establishing mutual information as a scalable, cross-linguistically comparable measure of prosodic–lexical integration.

Technology Category

Application Category

📝 Abstract

This paper argues that the relationship between lexical identity and prosody -- one well-studied parameter of linguistic variation -- can be characterized using information theory. We predict that languages that use prosody to make lexical distinctions should exhibit a higher mutual information between word identity and prosody, compared to languages that don't. We test this hypothesis in the domain of pitch, which is used to make lexical distinctions in tonal languages, like Cantonese. We use a dataset of speakers reading sentences aloud in ten languages across five language families to estimate the mutual information between the text and their pitch curves. We find that, across languages, pitch curves display similar amounts of entropy. However, these curves are easier to predict given their associated text in the tonal languages, compared to pitch- and stress-accent languages, and thus the mutual information is higher in these languages, supporting our hypothesis. Our results support perspectives that view linguistic typology as gradient, rather than categorical.

Problem

Research questions and friction points this paper is trying to address.

Characterize prosodic typology using information theory

Measure mutual information between word identity and prosody

Compare pitch entropy in tonal vs. non-tonal languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using information theory for prosodic typology analysis

Measuring mutual information between text and pitch

Comparing tonal and non-tonal languages' pitch predictability

🔎 Similar Papers

Word-specific tonal realizations in Mandarin