DuwatBench: Bridging Language and Visual Heritage through an Arabic Calligraphy Benchmark for Multimodal Understanding

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This study addresses the limitations of existing cross-modal models in understanding stylized and artistic Arabic calligraphic text, as well as the absence of evaluation benchmarks tailored to Arabic visual textual heritage. To bridge this gap, we present DuwatBench—the first systematic multimodal understanding benchmark for Arabic calligraphy—encompassing six classical and modern scripts, with 1,272 samples (approximately 1,475 lexical items) and sentence-level detection annotations. Using this benchmark, we evaluate 13 state-of-the-art multilingual and Arabic-specific multimodal models, revealing significant deficiencies in recognizing calligraphic variants, robustness to artistic deformations, and fine-grained image-text alignment. This work fills a critical void in evaluating artistic writing systems beyond Latin scripts and advances the development of culturally aware and equitable multimodal AI.

Technology Category

Application Category

📝 Abstract

Arabic calligraphy represents one of the richest visual traditions of the Arabic language, blending linguistic meaning with artistic form. Although multimodal models have advanced across languages, their ability to process Arabic script, especially in artistic and stylized calligraphic forms, remains largely unexplored. To address this gap, we present DuwatBench, a benchmark of 1,272 curated samples containing about 1,475 unique words across six classical and modern calligraphic styles, each paired with sentence-level detection annotations. The dataset reflects real-world challenges in Arabic writing, such as complex stroke patterns, dense ligatures, and stylistic variations that often challenge standard text recognition systems. Using DuwatBench, we evaluated 13 leading Arabic and multilingual multimodal models and showed that while they perform well on clean text, they struggle with calligraphic variation, artistic distortions, and precise visual-text alignment. By publicly releasing DuwatBench and its annotations, we aim to advance culturally grounded multimodal research, foster fair inclusion of the Arabic language and visual heritage in AI systems, and support continued progress in this area. Our dataset (https://huggingface.co/datasets/MBZUAI/DuwatBench) and evaluation suit (https://github.com/mbzuai-oryx/DuwatBench) are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Arabic calligraphy

multimodal understanding

visual-text alignment

stylized text recognition

cultural heritage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Arabic calligraphy

multimodal benchmark

visual-text alignment