Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing SimulEval frameworks are deprecated, lacking support for output revision, long audio stream processing, and real-time demonstration. Method: We propose StreamST, the first open-source evaluation and demonstration framework tailored for long audio streams in streaming speech translation (SST). It introduces a novel evaluation paradigm compatible with retranslation mechanisms, enabling unified assessment of incremental decoding and output revision systems. We design a lightweight streaming inference interface and modules for computing low-latency metrics—Average Latency (AL) and Delay-Aware Latency (DAL). Additionally, we integrate an interactive web visualization interface powered by Gradio and WebSockets. Contribution/Results: StreamST significantly enhances evaluation reproducibility and fairness in model comparison. The framework is fully open-sourced and has been adopted by the research community.

Technology Category

Application Category

📝 Abstract

Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high translation quality. Research efforts on the topic have so far relied on the SimulEval repository, which is no longer maintained and does not support systems that revise their outputs. In addition, it has been designed for simulating the processing of short segments, rather than long-form audio streams, and it does not provide an easy method to showcase systems in a demo. As a solution, we introduce simulstream, the first open-source framework dedicated to unified evaluation and demonstration of StreamST systems. Designed for long-form speech processing, it supports not only incremental decoding approaches, but also re-translation methods, enabling for their comparison within the same framework both in terms of quality and latency. In addition, it also offers an interactive web interface to demo any system built within the tool.

Problem

Research questions and friction points this paper is trying to address.

Evaluates streaming speech-to-text translation with latency constraints

Supports revision and long-form audio for translation systems

Provides interactive demo for real-time translation evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source framework for streaming speech translation

Supports both incremental decoding and re-translation methods

Includes interactive web interface for system demonstration

🔎 Similar Papers

No similar papers found.