Joint Training and Decoding for Multilingual End-to-End Simultaneous Speech Translation

📅 2023-06-04

🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This paper addresses the practical need for one-to-many end-to-end simultaneous speech translation (SimulST) in real-world multilingual scenarios. To this end, we propose the first unified modeling framework supporting joint multilingual training and real-time decoding. Methodologically, we introduce a novel synchronous/asynchronous hybrid training paradigm: asynchronous multilingual pretraining enhances cross-lingual knowledge transfer, while synchronous fine-tuning preserves low-latency constraints; additionally, we design a unified-separate hybrid decoder to balance decoding efficiency and translation quality. We further construct TED-MMST—the first publicly available, multi-party aligned, multilingual end-to-end SimulST benchmark dataset. Experiments demonstrate that our approach achieves superior trade-offs between translation quality (BLEU) and latency (Average Lagging Time, ALAT) on TED-MMST. Both the codebase and the TED-MMST dataset are open-sourced.

Technology Category

Application Category

📝 Abstract

Recent studies on end-to-end speech translation(ST) have facilitated the exploration of multilingual end-to-end ST and end-to-end simultaneous ST. In this paper, we investigate end-to-end simultaneous speech translation in a one-to-many multilingual setting which is closer to applications in real scenarios. We explore a separate decoder architecture and a unified architecture for joint synchronous training in this scenario. To further explore knowledge transfer across languages, we propose an asynchronous training strategy on the proposed unified decoder architecture. A multi-way aligned multilingual end-to-end ST dataset was curated as a benchmark testbed to evaluate our methods. Experimental results demonstrate the effectiveness of our models on the collected dataset. Our codes and data are available at: https://github.com/XiaoMi/TED-MMST.

Problem

Research questions and friction points this paper is trying to address.

Explores multilingual end-to-end simultaneous speech translation.

Investigates joint training architectures for real-world applications.

Proposes asynchronous training for cross-language knowledge transfer.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint synchronous training with separate decoder architecture

Asynchronous training on unified decoder architecture

Multi-way aligned multilingual ST dataset for evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow