Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge in simultaneous speech translation where existing information-gain-based read/write strategies, due to their limited awareness of temporal context, often wait excessively for input, struggling to balance low latency and high translation quality. To overcome this, the authors propose the REINA framework, which incorporates a Supervised Alignment Network (REINA-SAN) and a Temporal Step Augmentation Network (REINA-TAN) to enhance contextual modeling in policy decisions, effectively mitigating read loops and instability. Integrated into the Whisper architecture, the proposed approach achieves up to a 7.1% improvement over strong baselines on the Normalized Streaming Efficiency (NoSE) metric, significantly advancing the Pareto frontier of streaming translation performance.

Technology Category

Application Category

📝 Abstract

Simultaneous Speech Translation (SimulST) requires balancing high translation quality with low latency. Recent work introduced REINA, a method that trains a Read/Write policy based on estimating the information gain of reading more audio. However, we find that information-based policies often lack temporal context, leading the policy to bias itself toward reading most of the audio before starting to write. We improve REINA using two distinct strategies: a supervised alignment network (REINA-SAN) and a timestep-augmented network (REINA-TAN). Our results demonstrate that while both methods significantly outperform the baseline and resolve stability issues, REINA-TAN provides a slightly superior Pareto frontier for streaming efficiency, whereas REINA-SAN offers more robustness against 'read loops'. Applied to Whisper, both methods improve the pareto frontier of streaming efficiency as measured by Normalized Streaming Efficiency (NoSE) scores up to 7.1% over existing competitive baselines.

Problem

Research questions and friction points this paper is trying to address.

Simultaneous Speech Translation

information gain

temporal context

latency

streaming efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simultaneous Speech Translation

Information Gain Policy

Temporal Awareness