Simultaneous Machine Translation with Large Language Models

📅 2023-09-13

🏛️ Australasian Language Technology Association Workshop

📈 Citations: 7

✨ Influential: 2

career value

212K/year

🤖 AI Summary

Real-world simultaneous machine translation (SimulMT) systems suffer from poor noise robustness, difficulty modeling long contexts, and inflexible knowledge injection, while specialized MT models struggle to balance linguistic understanding and generation capabilities. This paper presents the first systematic exploration of large language models (LLMs) for SimulMT, proposing the RALCP (Retrieval-Augmented Latency-Constrained Prediction) incremental decoding algorithm to reduce latency. Evaluated on the nine-language MUST-C benchmark using Llama2-7b-chat, our approach significantly outperforms specialized MT models in BLEU and LAAL scores, while exhibiting superior robustness and faster fine-tuning. Key contributions include: (1) empirical validation that LLMs simultaneously achieve high translation quality, low latency, and strong noise robustness in SimulMT; (2) a lightweight RALCP algorithm mitigating LLM inference overhead; and (3) a scalable paradigm for knowledge injection and long-context processing. Computational cost remains a deployment bottleneck.

📝 Abstract

Real-world simultaneous machine translation (SimulMT) systems face more challenges than just the quality-latency trade-off. They also need to address issues related to robustness with noisy input, processing long contexts, and flexibility for knowledge injection. These challenges demand models with strong language understanding and generation capabilities which may not often equipped by dedicated MT models. In this paper, we investigate the possibility of applying Large Language Models (LLM) to SimulMT tasks by using existing incremental-decoding methods with a newly proposed RALCP algorithm for latency reduction. We conducted experiments using the Llama2-7b-chat model on nine different languages from the MUST-C dataset. The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics. Further analysis indicates that LLM has advantages in terms of tuning efficiency and robustness. However, it is important to note that the computational cost of LLM remains a significant obstacle to its application in SimulMT.

Problem

Research questions and friction points this paper is trying to address.

Addressing quality-latency trade-off in simultaneous machine translation systems

Improving robustness with noisy input and long context processing

Enhancing flexibility for knowledge injection in translation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Large Language Models for simultaneous translation

Applying RALCP algorithm to reduce latency

Leveraging incremental-decoding methods for efficiency

🔎 Similar Papers

Conversational SimulMT: Efficient Simultaneous Translation with Large Language Models