π€ AI Summary
Although advanced language models excel at next-word prediction, they perform markedly worse than simple n-gram models in predicting human reading times. This study systematically evaluates the correlation between reading times and model outputs by integrating eye-tracking data, neural language models, and n-gram probabilities. The findings reveal that n-gramβbased models significantly outperform complex semantic models such as Transformers in predicting reading times for natural text, suggesting that human reading behavior relies more heavily on local statistical regularities than on deep semantic processing. These results challenge the prevailing assumption that large-scale language models universally capture human-like cognitive processes and offer a new perspective on computational modeling of language comprehension and reading behavior.
π Abstract
Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.