Do Large Language Models Understand Word Senses?

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study addresses the foundational question of whether large language models (LLMs) genuinely comprehend context-dependent word semantics. We systematically design four evaluation tasks—word sense disambiguation (WSD), definition generation, free-form explanation, and example sentence generation—spanning multiple domains and difficulty levels. For the first time, we comparatively assess instruction-tuned LLMs (GPT-4o, DeepSeek-V3) against dedicated WSD systems. Results show that LLMs match state-of-the-art specialized models on WSD, while achieving 98% accuracy on generative semantic understanding tasks—substantially outperforming baselines. Our key contribution is the construction of the first integrated evaluation framework unifying discriminative and generative paradigms for lexical semantic understanding. Empirical findings demonstrate that current mainstream LLMs possess fine-grained, context-sensitive word sense comprehension approaching human-level proficiency.

Technology Category

Application Category

📝 Abstract

Understanding the meaning of words in context is a fundamental capability for Large Language Models (LLMs). Despite extensive evaluation efforts, the extent to which LLMs show evidence that they truly grasp word senses remains underexplored. In this paper, we address this gap by evaluating both i) the Word Sense Disambiguation (WSD) capabilities of instruction-tuned LLMs, comparing their performance to state-of-the-art systems specifically designed for the task, and ii) the ability of two top-performing open- and closed-source LLMs to understand word senses in three generative settings: definition generation, free-form explanation, and example generation. Notably, we find that, in the WSD task, leading models such as GPT-4o and DeepSeek-V3 achieve performance on par with specialized WSD systems, while also demonstrating greater robustness across domains and levels of difficulty. In the generation tasks, results reveal that LLMs can explain the meaning of words in context up to 98% accuracy, with the highest performance observed in the free-form explanation task, which best aligns with their generative capabilities.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' word sense disambiguation capabilities

Assessing generative understanding through definition and explanation tasks

Comparing performance against specialized WSD systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating WSD capabilities of instruction-tuned LLMs

Testing generative word sense understanding tasks

Comparing performance against specialized WSD systems

🔎 Similar Papers

No similar papers found.