Cross-lingual Character-Level Neural Morphological Tagging

📅 2017-08-30

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 73

✨ Influential: 10

career value

180K/year

🤖 AI Summary

This paper addresses the performance bottleneck in morphological tagging for low-resource languages caused by scarce supervised training data. We propose a cross-lingual character-level joint representation learning framework. Our method employs a multilingual shared character-level RNN tagger trained via multitask learning, enabling knowledge transfer from high-resource to low-resource languages without requiring any annotated data in the target language. The key innovation lies in the first deep integration of cross-lingual character embeddings with the morphological tagging task, achieving synergistic optimization of representation learning and structured prediction within a unified model. Experiments across multiple low-resource languages demonstrate substantial improvements in tagging accuracy, confirming the strong transferability of cross-lingual character representations. This work establishes a novel paradigm for unsupervised and weakly supervised morphological analysis.

📝 Abstract

Even for common NLP tasks, sufficient supervision is not available in many languages – morphological tagging is no exception. In the work presented here, we explore a transfer learning scheme, whereby we train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together. Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones.

Problem

Research questions and friction points this paper is trying to address.

Develop cross-lingual morphological tagging for low-resource languages

Transfer learning from high-resource to low-resource languages

Improve accuracy using joint character-level neural models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual transfer learning for tagging

Character-level recurrent neural networks

Joint multilingual representation learning

🔎 Similar Papers

Unsupervised Morphological Tree Tokenizer