Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Large multilingual speech translation models often suffer from excessive parameter counts, hindering simultaneous achievement of high inference efficiency and translation quality. To address this, we propose a parasitic dual-scale modeling paradigm centered on the Key-Value Sparse Prediction Network (KVSPN), integrated with enhanced speculative decoding, structured pruning, and knowledge distillation. Our method significantly accelerates inference without compromising accuracy: KVSPN alone achieves a 40% speedup, while the full pipeline—including distillation—yields a 2.6× inference acceleration over Whisper Medium, with superior BLEU and TER scores. Evaluated across six major languages, our approach establishes new state-of-the-art results in both translation quality and latency, marking the first work to jointly optimize performance and efficiency in multilingual speech translation. This enables practical, cost-effective on-device deployment.

Technology Category

Application Category

📝 Abstract

Recent advancements in speech-to-text translation have led to the development of multilingual models capable of handling multiple language pairs simultaneously. However, these unified models often suffer from large parameter sizes, making it challenging to balance inference efficiency and performance, particularly in local deployment scenarios. We propose an innovative Parasitic Dual-Scale Approach, which combines an enhanced speculative sampling method with model compression and knowledge distillation techniques. Building on the Whisper Medium model, we enhance it for multilingual speech translation into whisperM2M, and integrate our novel KVSPN module, achieving state-of-the-art (SOTA) performance across six popular languages with improved inference efficiency. KVSPN enables a 40% speedup with no BLEU score degradation. Combined with distillation methods, it represents a 2.6$ imes$ speedup over the original Whisper Medium with superior performance.

Problem

Research questions and friction points this paper is trying to address.

Balancing efficiency and performance in multilingual speech translation

Reducing large parameter sizes in unified multilingual models

Improving inference speed without degrading translation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parasitic Dual-Scale Approach for efficiency

KVSPN module enhances speed and accuracy

Distillation methods boost performance significantly

🔎 Similar Papers

No similar papers found.