CALE : Concept-Aligned Embeddings for Both Within-Lemma and Inter-Lemma Sense Differentiation

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the fragmentation between word sense disambiguation (WSD) and cross-token semantic relation modeling by proposing Concept Alignment Embeddings (CALE), a unified framework. Methodologically, it extends the conventional word-in-context paradigm from homonymy resolution to cross-lexical semantic contrast, enabling joint modeling of intra-word polysemy and inter-word semantic associations for the first time. To this end, we construct a cross-lexical concept contrast dataset covering polysemous words and semantically related word pairs, design a concept alignment contrastive learning objective, and fine-tune contextualized language models (e.g., BERT) on SemCor-derived data. Empirical results demonstrate that CALE achieves state-of-the-art performance across diverse lexical semantic tasks—including WSD, word sense similarity, and semantic analogy—while significantly enhancing semantic separability and structural coherence in the embedding space.

Technology Category

Application Category

📝 Abstract

Lexical semantics is concerned with both the multiple senses a word can adopt in different contexts, and the semantic relations that exist between meanings of different words. To investigate them, Contextualized Language Models are a valuable tool that provides context-sensitive representations that can be used to investigate lexical meaning. Recent works like XL-LEXEME have leveraged the task of Word-in-Context to fine-tune them to get more semantically accurate representations, but Word-in-Context only compares occurrences of the same lemma, limiting the range of captured information. In this paper, we propose an extension, Concept Differentiation, to include inter-words scenarios. We provide a dataset for this task, derived from SemCor data. Then we fine-tune several representation models on this dataset. We call these models Concept-Aligned Embeddings (CALE). By challenging our models and other models on various lexical semantic tasks, we demonstrate that the proposed models provide efficient multi-purpose representations of lexical meaning that reach best performances in our experiments. We also show that CALE's fine-tuning brings valuable changes to the spatial organization of embeddings.

Problem

Research questions and friction points this paper is trying to address.

Enhancing sense differentiation within and between word lemmas

Extending Word-in-Context to inter-word semantic scenarios

Improving lexical meaning representations via concept-aligned embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-Aligned Embeddings for sense differentiation

Fine-tuning models on Concept Differentiation dataset

Enhancing lexical meaning representations spatially

🔎 Similar Papers

To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models