In the LLM era, Word Sense Induction remains unsolved

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

133K/year

🤖 AI Summary

This study addresses the scarcity of annotated data in unsupervised Word Sense Induction (WSI), particularly in low-resource or domain-specific settings, by proposing a new evaluation benchmark derived from SemCor that better reflects real-world polysemy and word frequency distributions. The work systematically evaluates the performance of pretrained embeddings, clustering algorithms, and large language models (LLMs) on this benchmark. Innovatively leveraging Wiktionary knowledge, the authors enhance WSI through must-link constrained clustering and semi-supervised data augmentation. Experimental results show that the proposed approach outperforms the previous state-of-the-art system by 3.3% on the new test set; however, it still fails to surpass the simple “one cluster per word” heuristic baseline, indicating that WSI remains fundamentally unresolved. This study also provides the first empirical validation of Wiktionary’s utility for WSI and reveals inherent limitations of current LLMs in this task.

Technology Category

Application Category

📝 Abstract

In the absence of sense-annotated data, word sense induction (WSI) is a compelling alternative to word sense disambiguation, particularly in low-resource or domain-specific settings. In this paper, we emphasize methodological problems in current WSI evaluation. We propose an evaluation on a SemCor-derived dataset, respecting the original corpus polysemy and frequency distributions. We assess pre-trained embeddings and clustering algorithms across parts of speech, and propose and evaluate an LLM-based WSI method for English. We evaluate data augmentation sources (LLM-generated, corpus and lexicon), and semi-supervised scenarios using Wiktionary for data augmentation, must-link constraints, number of clusters per lemma. We find that no unsupervised method (whether ours or previous) surpasses the strong "one cluster per lemma" heuristic (1cpl). We also show that (i) results and best systems may vary across POS, (ii) LLMs have troubles performing this task, (iii) data augmentation is beneficial and (iv) capitalizing on Wiktionary does help. It surpasses previous SOTA system on our test set by 3.3\%. WSI is not solved, and calls for a better articulation of lexicons and LLMs' lexical semantics capabilities.

Problem

Research questions and friction points this paper is trying to address.

Word Sense Induction

Unsupervised Learning

Evaluation Methodology

Lexical Semantics

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Word Sense Induction

Large Language Models

Data Augmentation