Joint Training and Decoding for Multilingual End-to-End Simultaneous Speech Translation

πŸ“… 2023-06-04
πŸ›οΈ IEEE International Conference on Acoustics, Speech, and Signal Processing
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the practical need for one-to-many end-to-end simultaneous speech translation (SimulST) in real-world multilingual scenarios. To this end, we propose the first unified modeling framework supporting joint multilingual training and real-time decoding. Methodologically, we introduce a novel synchronous/asynchronous hybrid training paradigm: asynchronous multilingual pretraining enhances cross-lingual knowledge transfer, while synchronous fine-tuning preserves low-latency constraints; additionally, we design a unified-separate hybrid decoder to balance decoding efficiency and translation quality. We further construct TED-MMSTβ€”the first publicly available, multi-party aligned, multilingual end-to-end SimulST benchmark dataset. Experiments demonstrate that our approach achieves superior trade-offs between translation quality (BLEU) and latency (Average Lagging Time, ALAT) on TED-MMST. Both the codebase and the TED-MMST dataset are open-sourced.

Technology Category

Application Category

πŸ“ Abstract
Recent studies on end-to-end speech translation(ST) have facilitated the exploration of multilingual end-to-end ST and end-to-end simultaneous ST. In this paper, we investigate end-to-end simultaneous speech translation in a one-to-many multilingual setting which is closer to applications in real scenarios. We explore a separate decoder architecture and a unified architecture for joint synchronous training in this scenario. To further explore knowledge transfer across languages, we propose an asynchronous training strategy on the proposed unified decoder architecture. A multi-way aligned multilingual end-to-end ST dataset was curated as a benchmark testbed to evaluate our methods. Experimental results demonstrate the effectiveness of our models on the collected dataset. Our codes and data are available at: https://github.com/XiaoMi/TED-MMST.
Problem

Research questions and friction points this paper is trying to address.

Explores multilingual end-to-end simultaneous speech translation.
Investigates joint training architectures for real-world applications.
Proposes asynchronous training for cross-language knowledge transfer.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint synchronous training with separate decoder architecture
Asynchronous training on unified decoder architecture
Multi-way aligned multilingual ST dataset for evaluation
πŸ”Ž Similar Papers
No similar papers found.