Fanar: An Arabic-Centric Multimodal Generative AI Platform

📅 2025-01-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of Arabic AI systems in content generation, dialect identification, information veracity verification, and domain-specific adaptation—particularly in religious and news contexts—this paper introduces the first full-stack, Arabic-centric multimodal generative AI platform. Methodologically, it proposes a novel dual-model collaborative architecture (Fanar Star/Prime), integrates Islamic-knowledge-enhanced RAG with recency-aware RAG for temporal grounding, develops dialect-aware ASR, regionally adapted text-to-image generation, and end-to-end provenance-aware content verification. The platform incorporates custom 7B/9B Arabic LLMs, an intelligent prompt-routing orchestrator, and a content attribution service. Experimental results demonstrate state-of-the-art performance across major Arabic benchmarks: a 32% improvement in religious QA accuracy, minute-level news response latency, and a 41% reduction in cross-dialect ASR word error rate.

Technology Category

Application Category

📝 Abstract
We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star and Fanar Prime, two highly capable Arabic Large Language Models (LLMs) that are best in the class on well established benchmarks for similar sized models. Fanar Star is a 7B (billion) parameter model that was trained from scratch on nearly 1 trillion clean and deduplicated Arabic, English and Code tokens. Fanar Prime is a 9B parameter model continually trained on the Gemma-2 9B base model on the same 1 trillion token set. Both models are concurrently deployed and designed to address different types of prompts transparently routed through a custom-built orchestrator. The Fanar platform provides many other capabilities including a customized Islamic Retrieval Augmented Generation (RAG) system for handling religious prompts, a Recency RAG for summarizing information about current or recent events that have occurred after the pre-training data cut-off date. The platform provides additional cognitive capabilities including in-house bilingual speech recognition that supports multiple Arabic dialects, voice and image generation that is fine-tuned to better reflect regional characteristics. Finally, Fanar provides an attribution service that can be used to verify the authenticity of fact based generated content. The design, development, and implementation of Fanar was entirely undertaken at Hamad Bin Khalifa University's Qatar Computing Research Institute (QCRI) and was sponsored by Qatar's Ministry of Communications and Information Technology to enable sovereign AI technology development.
Problem

Research questions and friction points this paper is trying to address.

Arabic Cultural Customization
Multifunctional AI System
Dialect Recognition and Content Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Arab-Culture AI
Multimodal Generation
Indigenous Technology Advancement
🔎 Similar Papers
No similar papers found.
F
Fanar Team Ummar Abbas
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Mohammad Shahmeer Ahmad
Mohammad Shahmeer Ahmad
Research Engineer, Qatar Computing Research Institute
Information RetrievalData Centric AIAI SystemsLLMs
F
Firoj Alam
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
E
Enes Altinisik
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
E
Ehsannedin Asgari
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Yazan Boshmaf
Yazan Boshmaf
Qatar Computing Research Institute, HBKU
CybersecurityAIWeb3
S
Sabri Boughorbel
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
S
Sanjay Chawla
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
S
Shammur A. Chowdhury
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Fahim Dalvi
Fahim Dalvi
Qatar Computing Research Institute
Deep LearningMachine TranslationArtificial IntelligenceExplainable AI
Kareem Darwish
Kareem Darwish
QCRI
Information RetrievalNatural Language ProcessingArabic Natural Language ProcessingArabic NLP
Nadir Durrani
Nadir Durrani
Senior Scientist, QCRI, HBKU
Machine TranslationInterpretabilityTransliterationWord SegmentationNatural Language Processing
M
M. Elfeky
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Ahmed Elmagarmid
Ahmed Elmagarmid
Executive Director, Qatar Computing Research Institute
Database Systems
M
Mohamed Y. Eltabakh
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
M
M. Fatehkia
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
A
Anastasios Fragkopoulos
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Maram Hasanain
Maram Hasanain
Postdoc, Qatar Computing Research Institute, HBKU
Information RetrievalSocial Media
Majd Hawasly
Majd Hawasly
QCRI, Hamad Bin Khalifa University
Autonomous systemsLifelong learningNatural Language Processing
M
Mus'ab Husaini
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
S
Soon-Gyo Jung
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
J
J. Lucas
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Walid Magdy
Walid Magdy
School of Informatics, The University of Edinburgh
Computational Social ScienceNatural Language ProcessingArabic Natural Language Processing
Safa Messaoud
Safa Messaoud
Scientist, Qatar Computing Research Institute (QCRI)
Safe AIEnergy Based ModelsReinforcement LearningComputer VisionHealth intelligence
A
Abubakr Mohamed
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Tasnim Mohiuddin
Tasnim Mohiuddin
Scientist, QCRI, HBKU
Machine LearningNatural Language Processing
Basel Mousi
Basel Mousi
Qatar Computing Research Institute
Natural Language Processing
Hamdy Mubarak
Hamdy Mubarak
Principal Software Engineer, Qatar Computing Research Institute (QCRI), Qatar Foundation
Natural Language ProcessingSoftware EngineeringInformation ExtractionSocial Media AnalysisArabic NLP
A
Ahmad Musleh
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Z
Z. Naeem
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
M
M. Ouzzani
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Dorde Popovic
Dorde Popovic
Qatar Computing Research Institute, HBKU
Neural Trojan BackdoorsAdversarial LearningRobust Machine LearningFederated LearningEthics of Artificial Intelligence
A
Amin Sadeghi
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
H
H. Sencar
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
M
Mohammed Shinoy
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
O
Omar Sinan
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Y
Yifan Zhang
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
A
Ahmed Ali
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Yassine El Kheir
Yassine El Kheir
PhD Researcher, German Research Center for Artificial Intelligence (DFKI) & TU Berlin
Speech Deepfake DetectionSelf-Supervised LearningPronunciation Assessment
X
Xiaosong Ma
Qatar Computing Research Institute (QCRI), Hamad Bin Khalifa University
Chaoyi Ruan
Chaoyi Ruan
National University of Singapore