Speech Translation Refinement using Large Language Models

📅 2025-01-25

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address insufficient contextual modeling and error propagation from automatic speech recognition (ASR) in speech translation (ST), this paper proposes the first large language model (LLM)-based paradigm for joint refinement of ST outputs and ASR transcripts. Methodologically, it incorporates document-level context to enhance semantic coherence, supports training-free in-context learning, and enables parameter-efficient fine-tuning. We evaluate multiple LLMs—including GPT-3.5-turbo, LLaMA3-8B, and Mistral-12B—using zero-shot prompting and context-aware fine-tuning. Extensive experiments across seven tasks from MuST-C and CoVoST 2 demonstrate consistent and significant improvements over single-task refinement baselines; the joint refinement approach augmented with document-level context achieves the best performance. All code and data are publicly released.

Technology Category

Application Category

📝 Abstract

Recent advancements in large language models (LLMs) have demonstrated their remarkable capabilities across various language tasks. Inspired by the success of text-to-text translation refinement, this paper investigates how LLMs can improve the performance of speech translation by introducing a joint refinement process. Through the joint refinement of speech translation (ST) and automatic speech recognition (ASR) transcription via LLMs, the performance of the ST model is significantly improved in both training-free in-context learning and parameter-efficient fine-tuning scenarios. Additionally, we explore the effect of document-level context on refinement under the context-aware fine-tuning scenario. Experimental results on the MuST-C and CoVoST 2 datasets, which include seven translation tasks, demonstrate the effectiveness of the proposed approach using several popular LLMs including GPT-3.5-turbo, LLaMA3-8B, and Mistral-12B. Further analysis further suggests that jointly refining both transcription and translation yields better performance compared to refining translation alone. Meanwhile, incorporating document-level context significantly enhances refinement performance. We release our code and datasets on GitHub.

Problem

Research questions and friction points this paper is trying to address.

Speech Translation

Accuracy Improvement

Speech-to-Text Enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Advanced Large Language Models

Speech-to-Text Optimization

Contextual Translation Enhancement

🔎 Similar Papers

No similar papers found.