🤖 AI Summary
This study addresses the challenge of integrating classical Islamic medical knowledge—such as that found in Avicenna’s *Canon of Medicine* and *Prophetic Medicine*—with modern AI systems in a trustworthy, culturally grounded manner. Method: We propose Tibbe-AG, an evaluation framework targeting 30 preventive and holistic healthcare questions, employing three synergistic paradigms: direct answering, retrieval-augmented generation (RAG), and self-critique. We introduce the culture-adapted 3C3H quality scoring system and a scientific self-critique filtering mechanism, coupled with LLM-agent adjudication. Our methodology integrates Islamic corpus–driven prompt engineering, multi-stage LLM agent chains (generation → self-evaluation → scoring), and a structured assessment protocol. Contribution/Results: Experiments show RAG improves factual accuracy by 13%, with self-critique yielding an additional 10% gain. Qwen2-7B achieves optimal overall performance. This work establishes a novel paradigm for culturally sensitive, interpretable, and safe AI-assisted healthcare.
📝 Abstract
Centuries-old Islamic medical texts like Avicenna's Canon of Medicine and the Prophetic Tibb-e-Nabawi encode a wealth of preventive care, nutrition, and holistic therapies, yet remain inaccessible to many and underutilized in modern AI systems. Existing language-model benchmarks focus narrowly on factual recall or user preference, leaving a gap in validating culturally grounded medical guidance at scale. We propose a unified evaluation pipeline, Tibbe-AG, that aligns 30 carefully curated Prophetic-medicine questions with human-verified remedies and compares three LLMs (LLaMA-3, Mistral-7B, Qwen2-7B) under three configurations: direct generation, retrieval-augmented generation, and a scientific self-critique filter. Each answer is then assessed by a secondary LLM serving as an agentic judge, yielding a single 3C3H quality score. Retrieval improves factual accuracy by 13%, while the agentic prompt adds another 10% improvement through deeper mechanistic insight and safety considerations. Our results demonstrate that blending classical Islamic texts with retrieval and self-evaluation enables reliable, culturally sensitive medical question-answering.