Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited semantic understanding and reasoning capabilities of large language models (LLMs) in Islamic knowledge question answering. We propose a three-stage hybrid retrieval-augmented generation (RAG) framework: (1) initial keyword-based retrieval using BM25; (2) fine-grained semantic matching via dense embedding models; and (3) candidate paragraph re-ranking using a cross-encoder. By synergistically integrating sparse and dense retrieval paradigms, the framework significantly improves semantic matching accuracy and answer generation quality for religious texts. Experimental results on two Islamic knowledge subtasks demonstrate up to a 25% absolute improvement in accuracy over baseline methods. Specifically, the Fanar model achieves 45% accuracy on Subtask 1 and 80% on Subtask 2, validating the framework’s effectiveness and generalizability for domain-specific knowledge QA.

Technology Category

Application Category

📝 Abstract
This paper presents our submission to the QIAS 2025 shared task on Islamic knowledge understanding and reasoning. We developed a hybrid retrieval-augmented generation (RAG) system that combines sparse and dense retrieval methods with cross-encoder reranking to improve large language model (LLM) performance. Our three-stage pipeline incorporates BM25 for initial retrieval, a dense embedding retrieval model for semantic matching, and cross-encoder reranking for precise content retrieval. We evaluate our approach on both subtasks using two LLMs, Fanar and Mistral, demonstrating that the proposed RAG pipeline enhances performance across both, with accuracy improvements up to 25%, depending on the task and model configuration. Our best configuration is achieved with Fanar, yielding accuracy scores of 45% in Subtask 1 and 80% in Subtask 2.
Problem

Research questions and friction points this paper is trying to address.

Improving Islamic knowledge question answering accuracy
Enhancing large language models with hybrid retrieval methods
Combining sparse and dense retrieval with reranking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid RAG system combining sparse and dense retrieval
Three-stage pipeline with BM25 and cross-encoder reranking
Enhanced LLM performance using semantic matching techniques
🔎 Similar Papers
No similar papers found.
M
Muhammad Abu Ahmad
Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
Mohamad Ballout
Mohamad Ballout
PhD in Cognitive Science, University of Osnabrück
Computer VisionDeep LearningCognitive Science
R
Raia Abu Ahmad
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI), Berlin, Germany
Elia Bruni
Elia Bruni
University of Osnabrück
Natural Language ProcessingComputational Dialogue ModellingComputer VisionMachine Learning