Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual annotation of lexical psycholinguistic features (e.g., familiarity) is costly and poorly scalable. Method: This study systematically evaluates large language models (LLMs) as substitutes for human annotators, employing zero-shot prompting, supervised fine-tuning, and comparative evaluation across commercial and open-source LLMs—rigorously benchmarked against human “gold-standard” ratings using Spearman’s ρ. Contribution/Results: Integrating prompt engineering, lightweight fine-tuning, and statistical validation, our framework achieves ρ = 0.80 zero-shot and ρ = 0.90 after fine-tuning on English word familiarity prediction—matching or approaching inter-annotator reliability for the first time in this task. These results substantially enhance the credibility and practical utility of LLMs in psycholinguistic empirical research.

Technology Category

Application Category

📝 Abstract
Word-level psycholinguistic norms lend empirical support to theories of language processing. However, obtaining such human-based measures is not always feasible or straightforward. One promising approach is to augment human norming datasets by using Large Language Models (LLMs) to predict these characteristics directly, a practice that is rapidly gaining popularity in psycholinguistics and cognitive science. However, the novelty of this approach (and the relative inscrutability of LLMs) necessitates the adoption of rigorous methodologies that guide researchers through this process, present the range of possible approaches, and clarify limitations that are not immediately apparent, but may, in some cases, render the use of LLMs impractical. In this work, we present a comprehensive methodology for estimating word characteristics with LLMs, enriched with practical advice and lessons learned from our own experience. Our approach covers both the direct use of base LLMs and the fine-tuning of models, an alternative that can yield substantial performance gains in certain scenarios. A major emphasis in the guide is the validation of LLM-generated data with human "gold standard" norms. We also present a software framework that implements our methodology and supports both commercial and open-weight models. We illustrate the proposed approach with a case study on estimating word familiarity in English. Using base models, we achieved a Spearman correlation of 0.8 with human ratings, which increased to 0.9 when employing fine-tuned models. This methodology, framework, and set of best practices aim to serve as a reference for future research on leveraging LLMs for psycholinguistic and lexical studies.
Problem

Research questions and friction points this paper is trying to address.

Predicting word characteristics using Large Language Models
Augmenting human psycholinguistic norms with LLM-generated data
Validating LLM outputs against human gold standards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs to predict word characteristics
Validating LLM outputs with human gold standards
Providing software framework for commercial and open models
🔎 Similar Papers
No similar papers found.
J
Javier Conde
Information Processing and Telecommunications Center (IPTC), Universidad Politécnica de Madrid (Spain)
María Grandury
María Grandury
SomosNLP / Polytechnical University of Madrid
Natural Language ProcessingLLM Evaluation
T
Tairan Fu
Politecnico di Milano (Italy)
C
Carlos Arriaga
Information Processing and Telecommunications Center (IPTC), Universidad Politécnica de Madrid (Spain)
Gonzalo Martínez
Gonzalo Martínez
Universidad Carlos III de Madrid
T
Thomas Clark
Massachusetts Institute of Technology (United States)
Sean Trott
Sean Trott
Assistant Teaching Professor, UC San Diego
cognitive sciencepragmatic inferenceambiguitylarge language models
C
Clarence Gerald Green
Faculty of Education, University of Hong Kong, (Hong Kong )
P
Pedro Reviriego
Information Processing and Telecommunications Center (IPTC), Universidad Politécnica de Madrid (Spain)
Marc Brysbaert
Marc Brysbaert
Professor of Cognitive Psychology, Ghent University, Belgium
psycholinguisticsbilingualismcognitive psychologyvocabularyindividual differences