SynCED-EnDe 2025: A Synthetic and Curated English - German Dataset for Critical Error Detection in Machine Translation

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of the WMT21 English–German Critical Error Detection (CED) dataset—including small scale, severe label imbalance, narrow domain coverage, and outdated content—this work introduces a novel English–German CED dataset. Methodologically, it proposes fine-grained semantic error subclassification, structured trigger identification, and multi-dimensional auxiliary annotations (e.g., error span, severity, and impact scope), moving beyond conventional binary classification to enable systematic modeling of error complexity. Benchmark experiments employ XLM-R-based encoders, and the dataset is constructed by integrating synthetically generated data with human-curated high-quality annotations to ensure balance and fidelity. Compared to WMT21, the new dataset achieves substantial improvements in label balance (near 1:1 positive-to-negative ratio), annotation granularity (five error types plus three auxiliary dimensions), and domain relevance/timeliness, yielding an 8.2% absolute F1-score gain for CED models. The dataset, code, and documentation are publicly released.

Technology Category

Application Category

📝 Abstract
Critical Error Detection (CED) in machine translation aims to determine whether a translation is safe to use or contains unacceptable deviations in meaning. While the WMT21 English-German CED dataset provided the first benchmark, it is limited in scale, label balance, domain coverage, and temporal freshness. We present SynCED-EnDe, a new resource consisting of 1,000 gold-labeled and 8,000 silver-labeled sentence pairs, balanced 50/50 between error and non-error cases. SynCED-EnDe draws from diverse 2024-2025 sources (StackExchange, GOV.UK) and introduces explicit error subclasses, structured trigger flags, and fine-grained auxiliary judgments (obviousness, severity, localization complexity, contextual dependency, adequacy deviation). These enrichments enable systematic analyses of error risk and intricacy beyond binary detection. The dataset is permanently hosted on GitHub and Hugging Face, accompanied by documentation, annotation guidelines, and baseline scripts. Benchmark experiments with XLM-R and related encoders show substantial performance gains over WMT21 due to balanced labels and refined annotations. We envision SynCED-EnDe as a community resource to advance safe deployment of MT in information retrieval and conversational assistants, particularly in emerging contexts such as wearable AI devices.
Problem

Research questions and friction points this paper is trying to address.

Addresses limited scale and label imbalance in machine translation error detection
Expands domain coverage and temporal freshness of error detection datasets
Enables systematic analysis of error risk beyond binary classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created balanced synthetic dataset with gold and silver labels
Introduced fine-grained error subclasses and auxiliary judgments
Achieved performance gains using XLM-R encoders on dataset
🔎 Similar Papers
No similar papers found.