Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large multilingual speech translation models often suffer from excessive parameter counts, hindering simultaneous achievement of high inference efficiency and translation quality. To address this, we propose a parasitic dual-scale modeling paradigm centered on the Key-Value Sparse Prediction Network (KVSPN), integrated with enhanced speculative decoding, structured pruning, and knowledge distillation. Our method significantly accelerates inference without compromising accuracy: KVSPN alone achieves a 40% speedup, while the full pipeline—including distillation—yields a 2.6× inference acceleration over Whisper Medium, with superior BLEU and TER scores. Evaluated across six major languages, our approach establishes new state-of-the-art results in both translation quality and latency, marking the first work to jointly optimize performance and efficiency in multilingual speech translation. This enables practical, cost-effective on-device deployment.

Technology Category

Application Category

📝 Abstract
Recent advancements in speech-to-text translation have led to the development of multilingual models capable of handling multiple language pairs simultaneously. However, these unified models often suffer from large parameter sizes, making it challenging to balance inference efficiency and performance, particularly in local deployment scenarios. We propose an innovative Parasitic Dual-Scale Approach, which combines an enhanced speculative sampling method with model compression and knowledge distillation techniques. Building on the Whisper Medium model, we enhance it for multilingual speech translation into whisperM2M, and integrate our novel KVSPN module, achieving state-of-the-art (SOTA) performance across six popular languages with improved inference efficiency. KVSPN enables a 40% speedup with no BLEU score degradation. Combined with distillation methods, it represents a 2.6$ imes$ speedup over the original Whisper Medium with superior performance.
Problem

Research questions and friction points this paper is trying to address.

Balancing efficiency and performance in multilingual speech translation
Reducing large parameter sizes in unified multilingual models
Improving inference speed without degrading translation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parasitic Dual-Scale Approach for efficiency
KVSPN module enhances speed and accuracy
Distillation methods boost performance significantly
🔎 Similar Papers
No similar papers found.
Chenyang Le
Chenyang Le
Shanghai Jiaotong University
Y
Yinfeng Xia
Honor Device Co, Ltd, China
H
Huiyan Li
Honor Device Co, Ltd, China
M
Manhong Wang
Honor Device Co, Ltd, China
Yutao Sun
Yutao Sun
Tsinghua University
Natural Language ProcessingMachine Learning
X
Xingyang Ma
Honor Device Co, Ltd, China
Yanmin Qian
Yanmin Qian
Professor, Shanghai Jiao Tong University
Speech and Language ProcessingSignal ProcessingMachine Learning