Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional simultaneous machine translation (SiMT) struggles to balance translation quality and low latency under strict real-time constraints; incremental READ/WRITE strategies alone suffer from an inherent trade-off between semantic fidelity and responsiveness. This paper proposes a human-like simultaneous interpreting framework that, for the first time, introduces four adaptive actions—segmentation, deletion, partial summarization, and pronominalization—into the SiMT action space, enabling joint optimization of semantic compression and rhythmic control. Leveraging a decoder-only large language model, we design action-aware prompting to construct training data and establish a latency-aware text-to-speech (TTS) evaluation pipeline. On the ACL60/60 English–Chinese and English–German benchmarks, our method achieves significant improvements in COMET-KIWI scores while reducing average latency, consistently outperforming all existing baselines.

Technology Category

Application Category

📝 Abstract
Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional encoder-decoder policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: SENTENCE_CUT, DROP, PARTIAL_SUMMARIZATION and PRONOMINALIZATION, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We implement these actions in a decoder-only large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and latency, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese and English-German benchmarks show that our framework consistently improves semantic metrics (e.g., COMET-KIWI) and achieves lower delay (measured by Average Lagging) compared to reference translations and salami-based baselines. Notably, combining DROP and SENTENCE_CUT yields the best overall balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.
Problem

Research questions and friction points this paper is trying to address.

Extends SiMT action space with adaptive restructuring and simplification
Implements decoder-only LLM framework with action-aware training
Improves translation quality and reduces latency on benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extended action space with four adaptive strategies
Implemented in decoder-only LLM with action-aware prompting
Latency-aware TTS pipeline for speech timing evaluation
🔎 Similar Papers
No similar papers found.