Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

๐Ÿ“… 2025-12-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing SimulEval frameworks are deprecated, lacking support for output revision, long audio stream processing, and real-time demonstration. Method: We propose StreamST, the first open-source evaluation and demonstration framework tailored for long audio streams in streaming speech translation (SST). It introduces a novel evaluation paradigm compatible with retranslation mechanisms, enabling unified assessment of incremental decoding and output revision systems. We design a lightweight streaming inference interface and modules for computing low-latency metricsโ€”Average Latency (AL) and Delay-Aware Latency (DAL). Additionally, we integrate an interactive web visualization interface powered by Gradio and WebSockets. Contribution/Results: StreamST significantly enhances evaluation reproducibility and fairness in model comparison. The framework is fully open-sourced and has been adopted by the research community.

Technology Category

Application Category

๐Ÿ“ Abstract
Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high translation quality. Research efforts on the topic have so far relied on the SimulEval repository, which is no longer maintained and does not support systems that revise their outputs. In addition, it has been designed for simulating the processing of short segments, rather than long-form audio streams, and it does not provide an easy method to showcase systems in a demo. As a solution, we introduce simulstream, the first open-source framework dedicated to unified evaluation and demonstration of StreamST systems. Designed for long-form speech processing, it supports not only incremental decoding approaches, but also re-translation methods, enabling for their comparison within the same framework both in terms of quality and latency. In addition, it also offers an interactive web interface to demo any system built within the tool.
Problem

Research questions and friction points this paper is trying to address.

Evaluates streaming speech-to-text translation with latency constraints
Supports revision and long-form audio for translation systems
Provides interactive demo for real-time translation evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source framework for streaming speech translation
Supports both incremental decoding and re-translation methods
Includes interactive web interface for system demonstration
๐Ÿ”Ž Similar Papers
No similar papers found.