DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address discourse incoherence in speech translation (ST) caused by ASR noise, this paper proposes an online document-level context fusion framework. Methodologically, it introduces a lightweight, multi-level document context encoder with dynamic context injection across three stages: ASR refinement, translation, and post-editing; additionally, it incorporates an LLM-assisted module featuring a novel hallucination-averse adaptive post-editing decision strategy. The key contributions are: (1) the first low-overhead, highly robust online document context modeling approach for ST; and (2) consistent improvements over strong baselines across four mainstream LLMs, achieving significant gains in both sentence-level BLEU and document-level coherence metrics—demonstrating the efficacy of document-level context in enhancing noisy speech translation.

Technology Category

Application Category

📝 Abstract
Document-level context is crucial for handling discourse challenges in text-to-text document-level machine translation (MT). Despite the increased discourse challenges introduced by noise from automatic speech recognition (ASR), the integration of document-level context in speech translation (ST) remains insufficiently explored. In this paper, we develop DoCIA, an online framework that enhances ST performance by incorporating document-level context. DoCIA decomposes the ST pipeline into four stages. Document-level context is integrated into the ASR refinement, MT, and MT refinement stages through auxiliary LLM (large language model)-based modules. Furthermore, DoCIA leverages document-level information in a multi-level manner while minimizing computational overhead. Additionally, a simple yet effective determination mechanism is introduced to prevent hallucinations from excessive refinement, ensuring the reliability of the final results. Experimental results show that DoCIA significantly outperforms traditional ST baselines in both sentence and discourse metrics across four LLMs, demonstrating its effectiveness in improving ST performance.
Problem

Research questions and friction points this paper is trying to address.

Incorporates document-level context in speech translation
Addresses noise from automatic speech recognition in translation
Prevents hallucinations from excessive refinement in translation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online framework integrating document-level context
Multi-level document info with low overhead
Determination mechanism prevents excessive refinement
🔎 Similar Papers
No similar papers found.
Xinglin Lyu
Xinglin Lyu
PhD Student of Software Engineering, Soochow University
Machine TranslationNatural Language Processing
W
Wei Tang
Huawei Translation Services Center, Beijing, China
Yuang Li
Yuang Li
2012 Lab, Huawei
SpeechNLP
X
Xiaofeng Zhao
Huawei Translation Services Center, Beijing, China
M
Ming Zhu
Huawei Translation Services Center, Beijing, China
J
Junhui Li
School of Computer Science and Technology, Soochow University, Suzhou, China
Yunfei Lu
Yunfei Lu
Huawei
Large Language ModelMachine TranslationData Mining
M
Min Zhang
Huawei Translation Services Center, Beijing, China
D
Daimeng Wei
Huawei Translation Services Center, Beijing, China
H
Hao Yang
Huawei Translation Services Center, Beijing, China