Expanding the WMT24++ Benchmark with Rumantsch Grischun, Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A long-standing lack of machine translation (MT) evaluation benchmarks exists for the six Romansh varieties—Rumantsch Grischun, Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader—hindering progress in low-resource minority language MT. Method: We construct the first high-quality, multilingual parallel evaluation dataset covering all six varieties, built upon the WMT24++ framework. Professional human translators ensure precise cross-variety alignment and dialectal accuracy, with rigorous validation of inter-variety consistency. Contribution/Results: This benchmark fills a critical gap in MT evaluation for Swiss minority languages, enabling bidirectional German ↔ Romansh variety translation assessment. Experiments reveal that state-of-the-art MT systems exhibit limited performance on German→Romansh directions, particularly in generating dialect-specific forms; large language models fail to meaningfully alleviate this bottleneck. Our work provides a reproducible, extensible evaluation infrastructure for low-resource dialectal MT research.

Technology Category

Application Category

📝 Abstract
The Romansh language, spoken in Switzerland, has limited resources for machine translation evaluation. In this paper, we present a benchmark for six varieties of Romansh: Rumantsch Grischun, a supra-regional variety, and five regional varieties: Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader. Our reference translations were created by human translators based on the WMT24++ benchmark, which ensures parallelism with more than 55 other languages. An automatic evaluation of existing MT systems and LLMs shows that translation out of Romansh into German is handled relatively well for all the varieties, but translation into Romansh is still challenging.
Problem

Research questions and friction points this paper is trying to address.

Expanding machine translation benchmarks to include Romansh language varieties
Addressing limited resources for Romansh machine translation evaluation
Evaluating translation challenges between Romansh varieties and German
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expanding WMT24++ benchmark with Romansh varieties
Creating human-translated references for six dialects
Evaluating MT systems and LLMs performance
🔎 Similar Papers
No similar papers found.
Jannis Vamvas
Jannis Vamvas
University of Zurich
I
Ignacio Pérez Prat
Lia Rumantscha
N
Not Battesta Soliva
University of Zurich
S
Sandra Baltermia-Guetg
Lia Rumantscha
A
Andrina Beeli
Lia Rumantscha
S
Simona Beeli
Lia Rumantscha
M
Madlaina Capeder
Lia Rumantscha
L
Laura Decurtins
Lia Rumantscha
G
Gian Peder Gregori
Lia Rumantscha
F
Flavia Hobi
Lia Rumantscha
G
Gabriela Holderegger
Lia Rumantscha
A
Arina Lazzarini
Lia Rumantscha
V
Viviana Lazzarini
Lia Rumantscha
W
Walter Rosselli
Lia Rumantscha
B
Bettina Vital
Lia Rumantscha
Anna Rutkiewicz
Anna Rutkiewicz
University of Zurich
R
Rico Sennrich
University of Zurich