Overcoming Latency Bottlenecks in On-Device Speech Translation: A Cascaded Approach with Alignment-Based Streaming MT

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the latency bottleneck caused by cascading ASR and MT in on-device real-time streaming speech translation, this paper proposes an alignment-aware cascaded translation framework. Methodologically, it dynamically guides streaming MT decoding using linguistic cues from ASR output—such as word boundaries and semantic units—and introduces a joint time-constrained and forced-termination beam search pruning strategy to significantly reduce latency without compromising translation quality. Furthermore, an alignment-aware context management mechanism is incorporated to enhance cross-module information consistency. Evaluated on a pipeline comprising RNN-Transducer ASR and streaming MT for bilingual dialogue translation, the proposed framework achieves a +2.1 BLEU improvement and a 38% reduction in average latency over the baseline, substantially narrowing the performance gap with offline (non-streaming) systems.

Technology Category

Application Category

📝 Abstract

This paper tackles several challenges that arise when integrating Automatic Speech Recognition (ASR) and Machine Translation (MT) for real-time, on-device streaming speech translation. Although state-of-the-art ASR systems based on Recurrent Neural Network Transducers (RNN-T) can perform real-time transcription, achieving streaming translation in real-time remains a significant challenge. To address this issue, we propose a simultaneous translation approach that effectively balances translation quality and latency. We also investigate efficient integration of ASR and MT, leveraging linguistic cues generated by the ASR system to manage context and utilizing efficient beam-search pruning techniques such as time-out and forced finalization to maintain system's real-time factor. We apply our approach to an on-device bilingual conversational speech translation and demonstrate that our techniques outperform baselines in terms of latency and quality. Notably, our technique narrows the quality gap with non-streaming translation systems, paving the way for more accurate and efficient real-time speech translation.

Problem

Research questions and friction points this paper is trying to address.

Achieving real-time streaming speech translation on devices

Integrating ASR and MT systems with low latency

Balancing translation quality and latency constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Alignment-based streaming MT approach

ASR-MT integration using linguistic cues

Beam-search pruning for real-time efficiency

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

No related jobs found.

Authors to Follow