A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition

πŸ“… 2025-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Low-resource constraints hinder Urdu multimodal named entity recognition (MNER), due to the absence of annotated datasets and standardized baselines. Method: We introduce Twitter2015-Urdu, the first benchmark dataset for Urdu MNER, and propose U-MNERβ€”a lightweight cross-modal fusion framework. U-MNER jointly leverages Urdu-BERT for textual feature extraction and ResNet for visual feature extraction, incorporates a linguistically grounded modality alignment mechanism tailored to Urdu syntactic properties, and integrates a novel cross-modal interaction module. Additionally, we design a rule-based fine-grained entity annotation protocol. Contribution/Results: Experiments demonstrate that U-MNER achieves state-of-the-art performance on Twitter2015-Urdu, significantly outperforming existing methods. This work establishes the first standardized baseline for Urdu MNER and provides both critical data resources and a reproducible technical framework to advance MNER research for low-resource languages.

Technology Category

Application Category

πŸ“ Abstract
The emergence of multimodal content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasingly important area of research within Natural Language Processing. Despite progress in high-resource languages such as English, MNER remains underexplored for low-resource languages like Urdu. The primary challenges include the scarcity of annotated multimodal datasets and the lack of standardized baselines. To address these challenges, we introduce the U-MNER framework and release the Twitter2015-Urdu dataset, a pioneering resource for Urdu MNER. Adapted from the widely used Twitter2015 dataset, it is annotated with Urdu-specific grammar rules. We establish benchmark baselines by evaluating both text-based and multimodal models on this dataset, providing comparative analyses to support future research on Urdu MNER. The U-MNER framework integrates textual and visual context using Urdu-BERT for text embeddings and ResNet for visual feature extraction, with a Cross-Modal Fusion Module to align and fuse information. Our model achieves state-of-the-art performance on the Twitter2015-Urdu dataset, laying the groundwork for further MNER research in low-resource languages.
Problem

Research questions and friction points this paper is trying to address.

Addressing lack of Urdu multimodal named entity recognition datasets
Developing benchmark baselines for low-resource language MNER
Integrating text and visual context for Urdu MNER
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Urdu-BERT for text embeddings
Integrates ResNet for visual features
Employs Cross-Modal Fusion Module
πŸ”Ž Similar Papers
No similar papers found.