Towards Visually-Guided Movie Subtitle Translation for Indic Languages

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This work addresses the challenge of accurately conveying emotion and contextual nuance in subtitle translation for low-resource Indian languages, where reliance on text alone often proves insufficient. The authors propose a lightweight, selective visual augmentation mechanism that replaces only 20–30% of low-quality subtitle segments, substantially reducing computational overhead. Their approach integrates structured visual attribute summaries—extracted via a sliding window—with free-form textual summaries generated from visual gaps, leveraging multimodal alignment and evaluation through the COMET metric. Experiments on five full-length films demonstrate consistent improvements over text-only baselines, with attribute-based, coarse-grained visual summaries proving particularly robust in recovering missing emotional and socio-pragmatic context.

📝 Abstract

Movie subtitle translation is inherently multimodal, yet text-only systems often miss visual cues needed to convey emotion, action, and social nuance, especially for low-resource Indic languages (English to Hindi, Bengali, Telugu, Tamil and Kannada). We present a case study on five full-length films and compare two lightweight visual grounding strategies: structured attribute summaries from a 5-minute sliding window and free-text summaries of inter-subtitle visual gaps. Our analysis shows that temporal misalignment between subtitles and frames is a major obstacle in long-form video, often rendering indiscriminate visual grounding ineffective. However, oracle selective grounding, which replaces only the lowest-quality 20-30\% of baseline segments with visual-enhanced outputs, consistently improves COMET over the text-only baseline while requiring far less visual processing. Among the two approaches, coarse attribute-based visual context summarization is more robust, capturing scene-level emotion and contextual subtle cues that text alone often misses

Problem

Research questions and friction points this paper is trying to address.

subtitle translation

multimodal

Indic languages

visual cues

low-resource

Innovation

Methods, ideas, or system contributions that make the work stand out.

visually-guided translation

multimodal subtitle translation

low-resource Indic languages