π€ AI Summary
To address the fundamental trade-off between low latency and high translation quality in real-time speech translation, this paper proposes a novel synchronous machine translation paradigm leveraging large language model (LLM)-guided lookahead prediction. Unlike conventional approaches that rely solely on already-received source tokens, our method is the first to employ LLMs to predict upcoming source words and introduces a risk-aware lookahead framework (TAF) that guides incremental target generation without significantly increasing latency. By tightly integrating LLM-assisted prediction with a synchronous translation architecture, our approach achieves state-of-the-art latency-quality trade-offs across four language pairsβEnglish-to-Chinese, Japanese, Korean, and German. Under identical three-token latency constraints, it improves BLEU scores by up to 5.0 points over strong baselines. The implementation is publicly available.
π Abstract
Simultaneous machine translation (SMT) takes streaming input utterances and incrementally produces target text. Existing SMT methods mainly use the partial utterance that has already arrived at the input and the generated hypothesis. Motivated by human interpreters' technique to forecast future words before hearing them, we propose $ extbf{T}$ranslation by $ extbf{A}$nticipating $ extbf{F}$uture (TAF), a method to improve translation quality while retraining low latency. Its core idea is to use a large language model (LLM) to predict future source words and opportunistically translate without introducing too much risk. We evaluate our TAF and multiple baselines of SMT on four language directions. Experiments show that TAF achieves the best translation quality-latency trade-off and outperforms the baselines by up to 5 BLEU points at the same latency (three words). Code is released at https://github.com/owaski/TAF