Evaluating the Impact of Verbal Multiword Expressions on Machine Translation

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Verb Multi-Word Expressions (VMWEs)—including verb idioms, phrasal verbs, and light-verb constructions—pose significant challenges for machine translation (MT) due to their non-compositional semantics, frequently causing inaccuracies in multilingual MT systems. Method: We propose an LLM-based VMWE rewriting approach that automatically substitutes non-literal VMWEs with semantically equivalent literal paraphrases prior to MT decoding. Contribution/Results: Evaluated on multilingual parallel corpora and benchmark VMWE datasets across English→German/French/Spanish/Chinese directions, our method demonstrates that VMWEs substantially degrade overall translation quality (average BLEU reduction of 2.1–4.7 points). Rewriting yields statistically significant improvements—up to +3.9 BLEU—particularly for verb idioms and phrasal verbs. These results validate the efficacy of the “rewrite-then-translate” paradigm and establish a scalable, LLM-driven solution to the long-standing problem of translating non-compositional expressions.

Technology Category

Application Category

📝 Abstract

Verbal multiword expressions (VMWEs) present significant challenges for natural language processing due to their complex and often non-compositional nature. While machine translation models have seen significant improvement with the advent of language models in recent years, accurately translating these complex linguistic structures remains an open problem. In this study, we analyze the impact of three VMWE categories -- verbal idioms, verb-particle constructions, and light verb constructions -- on machine translation quality from English to multiple languages. Using both established multiword expression datasets and sentences containing these language phenomena extracted from machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality. We also propose an LLM-based paraphrasing approach that replaces these expressions with their literal counterparts, demonstrating significant improvement in translation quality for verbal idioms and verb-particle constructions.

Problem

Research questions and friction points this paper is trying to address.

Evaluating machine translation challenges with verbal multiword expressions

Analyzing translation quality impact across three VMWE categories

Addressing non-compositional linguistic structures in translation systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based paraphrasing for literal replacements

Evaluating multiword expressions in translation systems

Improving translation of idioms and verb constructions

🔎 Similar Papers

No similar papers found.