Do Large Language Models Understand Word Senses?

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the foundational question of whether large language models (LLMs) genuinely comprehend context-dependent word semantics. We systematically design four evaluation tasks—word sense disambiguation (WSD), definition generation, free-form explanation, and example sentence generation—spanning multiple domains and difficulty levels. For the first time, we comparatively assess instruction-tuned LLMs (GPT-4o, DeepSeek-V3) against dedicated WSD systems. Results show that LLMs match state-of-the-art specialized models on WSD, while achieving 98% accuracy on generative semantic understanding tasks—substantially outperforming baselines. Our key contribution is the construction of the first integrated evaluation framework unifying discriminative and generative paradigms for lexical semantic understanding. Empirical findings demonstrate that current mainstream LLMs possess fine-grained, context-sensitive word sense comprehension approaching human-level proficiency.

Technology Category

Application Category

📝 Abstract
Understanding the meaning of words in context is a fundamental capability for Large Language Models (LLMs). Despite extensive evaluation efforts, the extent to which LLMs show evidence that they truly grasp word senses remains underexplored. In this paper, we address this gap by evaluating both i) the Word Sense Disambiguation (WSD) capabilities of instruction-tuned LLMs, comparing their performance to state-of-the-art systems specifically designed for the task, and ii) the ability of two top-performing open- and closed-source LLMs to understand word senses in three generative settings: definition generation, free-form explanation, and example generation. Notably, we find that, in the WSD task, leading models such as GPT-4o and DeepSeek-V3 achieve performance on par with specialized WSD systems, while also demonstrating greater robustness across domains and levels of difficulty. In the generation tasks, results reveal that LLMs can explain the meaning of words in context up to 98% accuracy, with the highest performance observed in the free-form explanation task, which best aligns with their generative capabilities.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' word sense disambiguation capabilities
Assessing generative understanding through definition and explanation tasks
Comparing performance against specialized WSD systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating WSD capabilities of instruction-tuned LLMs
Testing generative word sense understanding tasks
Comparing performance against specialized WSD systems
🔎 Similar Papers
No similar papers found.
D
Domenico Meconi
Babelscape
S
Simone Stirpe
Babelscape
F
Federico Martelli
Sapienza NLP Group, Sapienza University of Rome
L
Leonardo Lavalle
Sapienza NLP Group, Sapienza University of Rome
Roberto Navigli
Roberto Navigli
Professor, Sapienza University of Rome
Natural Language ProcessingSemanticsComputational LinguisticsKnowledge AcquisitionArtificial Intelligence