Robust Language Identification for Romansh Varieties

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study addresses the challenge of effectively distinguishing between the multiple regional dialects of Romansh and its supra-regional standard variety, Rumantsch Grischun—a task that existing language identification systems struggle to perform accurately. To this end, the work proposes a support vector machine (SVM)-based approach for Romansh dialect identification and introduces, for the first time, a benchmark dataset encompassing both dialectal and Rumantsch Grischun texts across two distinct textual domains. Experimental results demonstrate that the proposed system achieves an average accuracy of 97% in in-domain evaluations, substantially advancing downstream applications such as dialect-aware spell checking and machine translation.

Technology Category

Application Category

📝 Abstract

The Romansh language has several regional varieties, called idioms, which sometimes have limited mutual intelligibility. Despite this linguistic diversity, there has been a lack of documented efforts to build a language identification (LID) system that can distinguish between these idioms. Since Romansh LID should also be able to recognize Rumantsch Grischun, a supra-regional variety that combines elements of several idioms, this makes for a novel and interesting classification problem. In this paper, we present a LID system for Romansh idioms based on an SVM approach. We evaluate our model on a newly curated benchmark across two domains and find that it reaches an average in-domain accuracy of 97%, enabling applications such as idiom-aware spell checking or machine translation. Our classifier is publicly available.

Problem

Research questions and friction points this paper is trying to address.

Language Identification

Romansh

Idioms

Rumantsch Grischun

Linguistic Diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language Identification

Romansh idioms

SVM classifier