How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?

📅 2025-07-29

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Prior work commonly approximates word-level contextual entropy using first-subword token entropy, but this approach systematically underestimates true word entropy and distorts its psycholinguistic predictive validity. Method: We propose a variable-length subword entropy estimator based on Monte Carlo sampling to accurately model the full-word probability distribution conditioned on context. Contribution/Results: Regression analyses with eye-tracking reading time data reveal a significant divergence in predictive power between first-subword entropy and our Monte Carlo–estimated word entropy: the latter exhibits superior explanatory power and captures entropy variation missed by the former. This study is the first to systematically identify the theoretical limitations of the first-subword approximation, establishing a more rigorous and scalable methodological foundation for constructing entropy metrics from language models in psycholinguistics.

Technology Category

Application Category

📝 Abstract

Contextual entropy is a psycholinguistic measure capturing the anticipated difficulty of processing a word just before it is encountered. Recent studies have tested for entropy-related effects as a potential complement to well-known effects from surprisal. For convenience, entropy is typically estimated based on a language model's probability distribution over a word's first subword token. However, this approximation results in underestimation and potential distortion of true word entropy. To address this, we generate Monte Carlo (MC) estimates of word entropy that allow words to span a variable number of tokens. Regression experiments on reading times show divergent results between first-token and MC word entropy, suggesting a need for caution in using first-token approximations of contextual entropy.

Problem

Research questions and friction points this paper is trying to address.

Evaluates accuracy of first-token entropy as a word entropy proxy

Assesses distortion risks in psycholinguistic processing difficulty predictions

Compares first-token and Monte Carlo entropy in reading time analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte Carlo estimates for word entropy

Variable token span for words

Comparison of first-token and MC entropy

🔎 Similar Papers

No similar papers found.

Authors to Follow