Streaming Sequence Transduction through Dynamic Compression

📅 2024-02-02
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental trade-off among latency, memory consumption, and translation quality in streaming sequence-to-sequence tasks—particularly automatic speech recognition (ASR) and simultaneous speech translation. To this end, we propose STAR, a novel streaming Transformer architecture. Its core innovations are: (i) a dynamic streaming segmentation mechanism that replaces fixed-size windows or hard truncation with learnable, adaptive segment boundaries; and (ii) anchor representation learning, jointly optimized with streaming attention masking to efficiently compress historical context within the Transformer framework. Experiments demonstrate that STAR achieves near-lossless 12× compression in ASR, while in simultaneous speech translation it reduces average latency by 37%, cuts memory usage by 52%, and lowers word error rate (WER) by 8.3% relatively—substantially outperforming existing streaming approaches.

Technology Category

Application Category

📝 Abstract
We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrates superior segmentation and latency-quality trade-offs in simultaneous speech-to-text tasks, optimizing latency, memory footprint, and quality.
Problem

Research questions and friction points this paper is trying to address.

Efficient sequence transduction for streaming input
Dynamic compression with minimal loss in ASR
Optimizing latency and quality in speech-to-text
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model for stream transduction
Dynamic segmentation with anchor representations
Nearly lossless 12x compression in ASR
🔎 Similar Papers
No similar papers found.