🤖 AI Summary
Arabic information retrieval faces significant challenges due to its rich morphology, optional diacritics, coexistence of Modern Standard Arabic and dialects, and scarcity of high-quality NLP resources. To address these issues, we propose an enhanced dense paragraph retrieval framework specifically designed for Arabic. Our method leverages a pre-trained Arabic language model and introduces an Attention-based Relevance Scoring (ARS) mechanism, which replaces conventional cross-attention matching with an end-to-end, adaptive modeling of fine-grained semantic associations between queries and paragraphs. Crucially, ARS preserves the efficiency and scalability of dense vector representations while substantially improving semantic matching accuracy and robustness. Experimental results demonstrate substantial gains over strong baselines on Arabic question-passage ranking tasks. To foster reproducibility and community advancement, we publicly release our implementation—establishing the first open-source, reproducible dense retrieval benchmark for Arabic information retrieval.
📝 Abstract
Arabic poses a particular challenge for natural language processing (NLP) and information retrieval (IR) due to its complex morphology, optional diacritics and the coexistence of Modern Standard Arabic (MSA) and various dialects. Despite the growing global significance of Arabic, it is still underrepresented in NLP research and benchmark resources. In this paper, we present an enhanced Dense Passage Retrieval (DPR) framework developed specifically for Arabic. At the core of our approach is a novel Attentive Relevance Scoring (ARS) that replaces standard interaction mechanisms with an adaptive scoring function that more effectively models the semantic relevance between questions and passages. Our method integrates pre-trained Arabic language models and architectural refinements to improve retrieval performance and significantly increase ranking accuracy when answering Arabic questions. The code is made publicly available at href{https://github.com/Bekhouche/APR}{GitHub}.