The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks

📅 2024-03-29

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Lexical Semantic Change Detection (LSCD) has long suffered from severe heterogeneity in datasets, preprocessing pipelines, and evaluation metrics, impeding fair model comparison and reproducibility. Method: We introduce the first modular, plug-and-play standardized LSCD benchmark platform that unifies evaluation protocols across three hierarchical tasks—Word-in-Context (WiC), Word Sense Induction (WSI), and LSCD. Our approach innovatively models lexical usage evolution as a graph structure, integrating cross-temporal semantic clustering, sense induction, and context-aware word sense disambiguation. All components are open-sourced with full implementation transparency. Contribution/Results: The framework significantly improves evaluation consistency and reproducibility, enables independent assessment of subtasks and joint optimization, and has emerged as the de facto community standard for LSCD research.

Technology Category

Application Category

📝 Abstract

Lexical Semantic Change Detection (LSCD) is a complex, lemma-level task, which is usually operationalized based on two subsequently applied usage-level tasks: First, Word-in-Context (WiC) labels are derived for pairs of usages. Then, these labels are represented in a graph on which Word Sense Induction (WSI) is applied to derive sense clusters. Finally, LSCD labels are derived by comparing sense clusters over time. This modularity is reflected in most LSCD datasets and models. It also leads to a large heterogeneity in modeling options and task definitions, which is exacerbated by a variety of dataset versions, preprocessing options and evaluation metrics. This heterogeneity makes it difficult to evaluate models under comparable conditions, to choose optimal model combinations or to reproduce results. Hence, we provide a benchmark repository standardizing LSCD evaluation. Through transparent implementation results become easily reproducible and by standardization different components can be freely combined. The repository reflects the task's modularity by allowing model evaluation for WiC, WSI and LSCD. This allows for careful evaluation of increasingly complex model components providing new ways of model optimization.

Problem

Research questions and friction points this paper is trying to address.

Standardizing evaluation for lexical semantic change detection

Addressing heterogeneity in dataset versions and metrics

Enabling reproducible model comparisons and component combinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardizes LSCD evaluation via a benchmark repository

Enables modular model evaluation for WiC, WSI, and LSCD

Allows reproducible results and flexible component combination

🔎 Similar Papers

No similar papers found.