MUNIChus: Multilingual News Image Captioning Benchmark

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited support for low-resource languages in existing news image captioning research, which has predominantly focused on English. To bridge this gap, the authors introduce MUNIChus, the first multilingual benchmark for news image captioning spanning nine languages—including low-resource ones such as Sinhala and Urdu. The dataset leverages both news articles and associated images to generate descriptive captions and enables a systematic evaluation of over twenty state-of-the-art multimodal models. MUNIChus fills a critical data void in cross-lingual multimodal understanding, highlighting the inherent challenges and untapped potential of multilingual news image captioning, and establishes a foundational resource and benchmark for future research in this domain.

Technology Category

Application Category

📝 Abstract
The goal of news image captioning is to generate captions by integrating news article content with corresponding images, highlighting the relationship between textual context and visual elements. The majority of research on news image captioning focuses on English, primarily because datasets in other languages are scarce. To address this limitation, we create the first multilingual news image captioning benchmark, MUNIChus, comprising 9 languages, including several low-resource languages such as Sinhala and Urdu. We evaluate various state-of-the-art neural news image captioning models on MUNIChus and find that news image captioning remains challenging. We also make MUNIChus publicly available with over 20 models already benchmarked. MUNIChus opens new avenues for further advancements in developing and evaluating multilingual news image captioning models.
Problem

Research questions and friction points this paper is trying to address.

news image captioning
multilingual
low-resource languages
benchmark dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual
news image captioning
benchmark
low-resource languages
cross-modal generation
🔎 Similar Papers
No similar papers found.