Locate-and-Focus: Enhancing Terminology Translation in Speech Language Models

πŸ“… 2025-07-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address inaccurate terminology translation in speech translation, this paper proposes a terminology-aware cross-modal focusing method. First, a speech segment localization module precisely identifies terminology-containing utterances, mitigating acoustic noise interference. Second, an audio-visual dual-modality alignment and knowledge fusion network is designed to explicitly model the correspondence between terminology in acoustic and textual spaces, thereby enhancing the model’s attention to critical terms. The method establishes an end-to-end terminology-aware translation framework that enables term-level fine-grained control without requiring additional annotations. Experiments on multiple benchmark datasets demonstrate a significant improvement in terminology translation accuracy (+12.3% BLEU-TER), while maintaining overall translation quality. These results validate the effectiveness and generalizability of the proposed cross-modal focusing mechanism.

Technology Category

Application Category

πŸ“ Abstract
Direct speech translation (ST) has garnered increasing attention nowadays, yet the accurate translation of terminology within utterances remains a great challenge. In this regard, current studies mainly concentrate on leveraging various translation knowledge into ST models. However, these methods often struggle with interference from irrelevant noise and can not fully utilize the translation knowledge. To address these issues, in this paper, we propose a novel Locate-and-Focus method for terminology translation. It first effectively locates the speech clips containing terminologies within the utterance to construct translation knowledge, minimizing irrelevant information for the ST model. Subsequently, it associates the translation knowledge with the utterance and hypothesis from both audio and textual modalities, allowing the ST model to better focus on translation knowledge during translation. Experimental results across various datasets demonstrate that our method effectively locates terminologies within utterances and enhances the success rate of terminology translation, while maintaining robust general translation performance.
Problem

Research questions and friction points this paper is trying to address.

Accurate terminology translation in speech remains challenging
Current methods struggle with noise and knowledge utilization
Proposed method locates terminologies and enhances translation success
Innovation

Methods, ideas, or system contributions that make the work stand out.

Locates terminology clips to minimize noise
Associates knowledge with audio and text
Enhances terminology translation success rate
πŸ”Ž Similar Papers
No similar papers found.
S
Suhang Wu
Department of Digital Media Technology, Xiamen University
Jialong Tang
Jialong Tang
Qwen Team, Alibaba
LLMNLP
C
Chengyi Yang
Department of Digital Media Technology, Xiamen University
P
Pei Zhang
Tongyi Lab
Baosong Yang
Baosong Yang
Alibaba-inc
Machine LearningLarge Language ModelMachine Translation
J
Junhui Li
Soochow University
J
Junfeng Yao
Department of Digital Media Technology, Xiamen University
M
Min Zhang
Soochow University
Jinsong Su
Jinsong Su
Xiamen University
Natural Language ProcessingDeep LearningNeural Machine Translation