🤖 AI Summary
Urdu idiom translation remains challenging due to low-resource constraints and intricate cultural-semantic dependencies, hindering machine translation performance. This work introduces the first bilingual-form (native Urdu script + Romanized) English-to-Urdu idiom translation evaluation dataset, enabling systematic assessment of large language models (LLMs) and neural machine translation (NMT) systems on cultural-semantic fidelity. We propose a multi-dimensional automatic evaluation framework—integrating BLEU, BERTScore, COMET, and XCOMET—combined with comparative prompt-engineering strategies. Results show that native-script input yields significantly higher translation quality than Romanized input; while prompt engineering consistently improves accuracy, performance gains across different prompt templates are marginal. This study is the first to empirically demonstrate the critical impact of textual representation on idiom translation quality, establishing a benchmark dataset and methodological foundation for culturally adaptive translation in low-resource languages.
📝 Abstract
Idiomatic translation remains a significant challenge in machine translation, especially for low resource languages such as Urdu, and has received limited prior attention. To advance research in this area, we introduce the first evaluation datasets for Urdu to English idiomatic translation, covering both Native Urdu and Roman Urdu scripts and annotated with gold-standard English equivalents. We evaluate multiple open-source Large Language Models (LLMs) and Neural Machine Translation (NMT) systems on this task, focusing on their ability to preserve idiomatic and cultural meaning. Automatic metrics including BLEU, BERTScore, COMET, and XCOMET are used to assess translation quality. Our findings indicate that prompt engineering enhances idiomatic translation compared to direct translation, though performance differences among prompt types are relatively minor. Moreover, cross script comparisons reveal that text representation substantially affects translation quality, with Native Urdu inputs producing more accurate idiomatic translations than Roman Urdu.