Hermes the Polyglot: A Unified Framework to Enhance Expressiveness for Multimodal Interlingual Subtitling

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first context-aware large language model (LLM) framework for cross-lingual subtitle translation that integrates speaker diarization, terminology recognition, and expressive enhancement to address challenges such as weak semantic coherence, inaccurate pronoun and term translation, and unnatural phrasing. By jointly modeling dialogue context and pragmatic features through a multi-module architecture, the approach significantly improves the coherence, accuracy, and fluency of translated subtitles. Experimental results demonstrate that the system achieves state-of-the-art performance in speaker diarization and generates semantically consistent and naturally expressed cross-lingual subtitles, establishing a new paradigm for multimodal subtitle translation.

Technology Category

Application Category

📝 Abstract
Interlingual subtitling, which translates subtitles of visual media into a target language, is essential for entertainment localization but has not yet been explored in machine translation. Although Large Language Models (LLMs) have significantly advanced the general capabilities of machine translation, the distinctive characteristics of subtitle texts pose persistent challenges in interlingual subtitling, particularly regarding semantic coherence, pronoun and terminology translation, and translation expressiveness. To address these issues, we present Hermes, an LLM-based automated subtitling framework. Hermes integrates three modules: Speaker Diarization, Terminology Identification, and Expressiveness Enhancement, which effectively tackle the above challenges. Experiments demonstrate that Hermes achieves state-of-the-art diarization performance and generates expressive, contextually coherent translations, thereby advancing research in interlingual subtitling.
Problem

Research questions and friction points this paper is trying to address.

interlingual subtitling
semantic coherence
pronoun translation
terminology translation
translation expressiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interlingual Subtitling
Large Language Models
Speaker Diarization
Terminology Identification
Expressiveness Enhancement
🔎 Similar Papers
No similar papers found.
Chaoqun Cui
Chaoqun Cui
Institute of Automation, Chinese Academy of Sciences
Machine LearningNatural Language Processing
Shijing Wang
Shijing Wang
beijing jiaotong university
deep learning
L
Liangbin Huang
Hujing Digital Media & Entertainment Group
Q
Qingqing Gu
Geely AI lab
Z
Zhaolong Huang
Hujing Digital Media & Entertainment Group
X
Xiao Zeng
Hujing Digital Media & Entertainment Group
Wenji Mao
Wenji Mao
Professor at Institute of Automation, Chinese Academy of Sciences
Artificial IntelligenceIntelligent AgentsSocial Modeling and Computing