Non-Monotonic Attention-based Read/Write Policy Learning for Simultaneous Translation

📅 2025-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Simultaneous machine translation (SiMT) faces a fundamental trade-off between low latency and high translation quality. To address this, we propose a dynamic read/write policy learning framework based on non-monotonic attention. Our method leverages a high-quality offline pre-trained seq2seq model and employs source–target alignment points—derived from word-level alignments—as weak supervision signals to train a lightweight binary classifier that learns an adjustable triggering decision boundary. Crucially, this is the first approach to explicitly incorporate alignment information as weak supervision for policy learning, enabling continuous, inference-time control over the latency–quality trade-off. Experiments on multiple streaming benchmarks demonstrate that our method significantly outperforms strong baselines: BLEU scores closely approach those of offline translation models, while maintaining millisecond-level latency. This effectively alleviates the inherent quality–latency tension in SiMT.

Technology Category

Application Category

📝 Abstract
Simultaneous or streaming machine translation generates translation while reading the input stream. These systems face a quality/latency trade-off, aiming to achieve high translation quality similar to non-streaming models with minimal latency. We propose an approach that efficiently manages this trade-off. By enhancing a pretrained non-streaming model, which was trained with a seq2seq mechanism and represents the upper bound in quality, we convert it into a streaming model by utilizing the alignment between source and target tokens. This alignment is used to learn a read/write decision boundary for reliable translation generation with minimal input. During training, the model learns the decision boundary through a read/write policy module, employing supervised learning on the alignment points (pseudo labels). The read/write policy module, a small binary classification unit, can control the quality/latency trade-off during inference. Experimental results show that our model outperforms several strong baselines and narrows the gap with the non-streaming baseline model.
Problem

Research questions and friction points this paper is trying to address.

Balancing translation quality and latency in simultaneous machine translation
Converting pretrained non-streaming models into efficient streaming models
Learning read/write policies using alignment-based pseudo labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances pretrained model with alignment
Learns read/write policy via supervision
Binary module controls quality/latency trade-off
🔎 Similar Papers
No similar papers found.