How Well Does First-Token Entropy Approximate Word Entropy as a Psycholinguistic Predictor?

๐Ÿ“… 2025-07-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Prior work commonly approximates word-level contextual entropy using first-subword token entropy, but this approach systematically underestimates true word entropy and distorts its psycholinguistic predictive validity. Method: We propose a variable-length subword entropy estimator based on Monte Carlo sampling to accurately model the full-word probability distribution conditioned on context. Contribution/Results: Regression analyses with eye-tracking reading time data reveal a significant divergence in predictive power between first-subword entropy and our Monte Carloโ€“estimated word entropy: the latter exhibits superior explanatory power and captures entropy variation missed by the former. This study is the first to systematically identify the theoretical limitations of the first-subword approximation, establishing a more rigorous and scalable methodological foundation for constructing entropy metrics from language models in psycholinguistics.

Technology Category

Application Category

๐Ÿ“ Abstract
Contextual entropy is a psycholinguistic measure capturing the anticipated difficulty of processing a word just before it is encountered. Recent studies have tested for entropy-related effects as a potential complement to well-known effects from surprisal. For convenience, entropy is typically estimated based on a language model's probability distribution over a word's first subword token. However, this approximation results in underestimation and potential distortion of true word entropy. To address this, we generate Monte Carlo (MC) estimates of word entropy that allow words to span a variable number of tokens. Regression experiments on reading times show divergent results between first-token and MC word entropy, suggesting a need for caution in using first-token approximations of contextual entropy.
Problem

Research questions and friction points this paper is trying to address.

Evaluates accuracy of first-token entropy as a word entropy proxy
Assesses distortion risks in psycholinguistic processing difficulty predictions
Compares first-token and Monte Carlo entropy in reading time analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monte Carlo estimates for word entropy
Variable token span for words
Comparison of first-token and MC entropy
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Christian Clark
The Ohio State University
Byung-Doh Oh
Byung-Doh Oh
New York University
computational linguisticspsycholinguisticsnatural language processing
W
William Schuler
The Ohio State University