Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies

📅 2025-09-26

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the limitations of conventional simultaneous machine translation, which relies solely on READ/WRITE operations and struggles to balance translation quality and latency under strict real-time constraints. The authors propose four adaptive actions—Sentence_Cut, Drop, Partial_Summarization, and Pronominalization—that formalize human simultaneous interpretation strategies into learnable operations. These actions are integrated into large language models through an action-aware prompt-based training paradigm. Additionally, a latency-aware evaluation metric is introduced, combining semantic fidelity with word-level monotonicity. Evaluated on the ACL60/60 multilingual benchmark, the approach significantly improves semantic quality while reducing latency, with the combination of Drop and Sentence_Cut effectively balancing responsiveness and fluency.

Technology Category

Application Category

📝 Abstract

Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: Sentence_Cut, Drop, Partial_Summarization and Pronominalization, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We adapt these actions in a large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and word-level monotonicity, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese, English-German and English-Japanese benchmarks show that our framework consistently improves semantic metrics and achieves lower delay compared to reference translations and salami-based baselines. Notably, combining Drop and Sentence_Cut leads to consistent improvements in the balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.

Problem

Research questions and friction points this paper is trying to address.

Simultaneous Machine Translation

real-time constraints

action space

semantic fidelity

latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simultaneous Machine Translation

Action Space Extension

Large Language Model