Robust Language Identification for Romansh Varieties

πŸ“… 2026-03-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of effectively distinguishing between the multiple regional dialects of Romansh and its supra-regional standard variety, Rumantsch Grischunβ€”a task that existing language identification systems struggle to perform accurately. To this end, the work proposes a support vector machine (SVM)-based approach for Romansh dialect identification and introduces, for the first time, a benchmark dataset encompassing both dialectal and Rumantsch Grischun texts across two distinct textual domains. Experimental results demonstrate that the proposed system achieves an average accuracy of 97% in in-domain evaluations, substantially advancing downstream applications such as dialect-aware spell checking and machine translation.

Technology Category

Application Category

πŸ“ Abstract
The Romansh language has several regional varieties, called idioms, which sometimes have limited mutual intelligibility. Despite this linguistic diversity, there has been a lack of documented efforts to build a language identification (LID) system that can distinguish between these idioms. Since Romansh LID should also be able to recognize Rumantsch Grischun, a supra-regional variety that combines elements of several idioms, this makes for a novel and interesting classification problem. In this paper, we present a LID system for Romansh idioms based on an SVM approach. We evaluate our model on a newly curated benchmark across two domains and find that it reaches an average in-domain accuracy of 97%, enabling applications such as idiom-aware spell checking or machine translation. Our classifier is publicly available.
Problem

Research questions and friction points this paper is trying to address.

Language Identification
Romansh
Idioms
Rumantsch Grischun
Linguistic Diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language Identification
Romansh idioms
SVM classifier
Rumantsch Grischun
dialect-aware NLP
πŸ”Ž Similar Papers
No similar papers found.