Neural Morphological Tagging for Nguni Languages

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Morphological parsing of highly agglutinative, richly affixing Nguni languages (e.g., Zulu, Xhosa) poses significant challenges due to sparse resources and complex word-internal structure. Method: We propose an end-to-end neural morphological tokenizer, the first to systematically demonstrate that a from-scratch LSTM–Neural CRF architecture outperforms fine-tuned pretrained models for low-resource agglutinative languages. Our approach integrates multi-source features—including lemmatization, part-of-speech tags, and syntactic boundaries—and enables efficient collaboration with rule-based segmenters. Contribution/Results: The lightweight model achieves 92.7% F1 on Zulu and Xhosa, surpassing fine-tuned baselines by 3.2% absolute average accuracy and substantially outperforming traditional rule-based parsers. This work establishes a reusable, high-accuracy, and deployable paradigm for morphological annotation in low-resource agglutinative languages.

Technology Category

Application Category

📝 Abstract

Morphological parsing is the task of decomposing words into morphemes, the smallest units of meaning in a language, and labelling their grammatical roles. It is a particularly challenging task for agglutinative languages, such as the Nguni languages of South Africa, which construct words by concatenating multiple morphemes. A morphological parsing system can be framed as a pipeline with two separate components, a segmenter followed by a tagger. This paper investigates the use of neural methods to build morphological taggers for the four Nguni languages. We compare two classes of approaches: training neural sequence labellers (LSTMs and neural CRFs) from scratch and finetuning pretrained language models. We compare performance across these two categories, as well as to a traditional rule-based morphological parser. Neural taggers comfortably outperform the rule-based baseline and models trained from scratch tend to outperform pretrained models. We also compare parsing results across different upstream segmenters and with varying linguistic input features. Our findings confirm the viability of employing neural taggers based on pre-existing morphological segmenters for the Nguni languages.

Problem

Research questions and friction points this paper is trying to address.

Develop neural taggers for Nguni languages' morphology

Compare neural vs. rule-based parsing performance

Evaluate segmenter impact on tagging accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural sequence labellers for morphological tagging

Finetuning pretrained language models

Comparison with rule-based parsing methods

🔎 Similar Papers

Cross-lingual Character-Level Neural Morphological Tagging