BBPOS: BERT-based Part-of-Speech Tagging for Uzbek

📅 2025-01-17

📈 Citations: 1

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Uzbek, a low-resource, morphologically rich language, lacks publicly available universal part-of-speech (UPOS) annotation resources and benchmark datasets for POS tagging. Method: We construct the first open UPOS-annotated benchmark dataset for Uzbek following Universal Dependencies guidelines, fine-tune two monolingual Uzbek BERT models on this data, and systematically evaluate their performance against multilingual BERT and a rule-based tagger. Contribution/Results: Fine-tuned monolingual Uzbek BERT achieves an average accuracy of 91%, substantially outperforming all baselines. This work provides the first empirical validation that monolingual pretraining effectively captures suffix-driven POS variation and context-sensitive morphology—capabilities beyond the reach of traditional rule-based systems. It establishes the first publicly available UPOS benchmark for Uzbek, fills a critical gap in Uzbek NLP infrastructure, and offers a reproducible evaluation framework and effective methodology for POS tagging in low-resource, morphologically complex languages.

Technology Category

Application Category

📝 Abstract

This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.

Problem

Research questions and friction points this paper is trying to address.

Uzbek language

Part-of-speech tagging

Accuracy improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uzbek BERT Models

Part-of-Speech Tagging

Contextual Adaptation

🔎 Similar Papers

No similar papers found.