Adapting Definition Modeling for New Languages: A Case Study on Belarusian

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the under-resourced language definition generation task, using Belarusian as a case study to alleviate the lack of automated definition support in lexicography. We introduce the first Belarusian definition dataset comprising 43,150 entries and propose a few-shot transfer framework based on pretrained language models, integrating context-aware decoding with lightweight adaptation. Experiments demonstrate that high-quality definitions can be generated using only ~1,000 annotated examples—substantially outperforming zero-shot baselines. Moreover, we reveal systematic discrepancies between mainstream automatic metrics (e.g., BLEU, BERTScore) and human evaluation. Our contributions are threefold: (1) establishing the first benchmark dataset for Belarusian definition generation; (2) empirically validating the feasibility of few-shot definition modeling for low-resource languages; and (3) highlighting critical limitations of automatic evaluation, thereby providing both methodological insights and empirical foundations for cross-lingual lexicographic research.

Technology Category

Application Category

📝 Abstract
Definition modeling, the task of generating new definitions for words in context, holds great prospect as a means to assist the work of lexicographers in documenting a broader variety of lects and languages, yet much remains to be done in order to assess how we can leverage pre-existing models for as-of-yet unsupported languages. In this work, we focus on adapting existing models to Belarusian, for which we propose a novel dataset of 43,150 definitions. Our experiments demonstrate that adapting a definition modeling systems requires minimal amounts of data, but that there currently are gaps in what automatic metrics do capture.
Problem

Research questions and friction points this paper is trying to address.

Adapting definition modeling to unsupported languages like Belarusian
Evaluating pre-existing models for new language applications
Assessing automatic metrics' limitations in definition modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapting definition modeling for Belarusian language
Proposing a novel dataset of 43,150 definitions
Minimal data needed for adapting existing models
🔎 Similar Papers
No similar papers found.
D
Daniela Kazakouskaya
University of Helsinki
Timothee Mickus
Timothee Mickus
University of Helsinki
NLGNLPDistributional SemanticsWord Embeddings
J
Janine Siewert
University of Helsinki