Adapting Definition Modeling for New Languages: A Case Study on Belarusian

📅 2025-07-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the under-resourced language definition generation task, using Belarusian as a case study to alleviate the lack of automated definition support in lexicography. We introduce the first Belarusian definition dataset comprising 43,150 entries and propose a few-shot transfer framework based on pretrained language models, integrating context-aware decoding with lightweight adaptation. Experiments demonstrate that high-quality definitions can be generated using only ~1,000 annotated examples—substantially outperforming zero-shot baselines. Moreover, we reveal systematic discrepancies between mainstream automatic metrics (e.g., BLEU, BERTScore) and human evaluation. Our contributions are threefold: (1) establishing the first benchmark dataset for Belarusian definition generation; (2) empirically validating the feasibility of few-shot definition modeling for low-resource languages; and (3) highlighting critical limitations of automatic evaluation, thereby providing both methodological insights and empirical foundations for cross-lingual lexicographic research.

Technology Category

Application Category

📝 Abstract

Definition modeling, the task of generating new definitions for words in context, holds great prospect as a means to assist the work of lexicographers in documenting a broader variety of lects and languages, yet much remains to be done in order to assess how we can leverage pre-existing models for as-of-yet unsupported languages. In this work, we focus on adapting existing models to Belarusian, for which we propose a novel dataset of 43,150 definitions. Our experiments demonstrate that adapting a definition modeling systems requires minimal amounts of data, but that there currently are gaps in what automatic metrics do capture.

Problem

Research questions and friction points this paper is trying to address.

Adapting definition modeling to unsupported languages like Belarusian

Evaluating pre-existing models for new language applications

Assessing automatic metrics' limitations in definition modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapting definition modeling for Belarusian language

Proposing a novel dataset of 43,150 definitions

Minimal data needed for adapting existing models

🔎 Similar Papers

No similar papers found.