On the Performance of an Explainable Language Model on PubMedQA

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
To address critical limitations of large language models (LLMs) in medical question answering—including poor interpretability, hallucination propensity, maintenance difficulty, and high computational cost—this paper proposes Gyan, a knowledge-decoupled, interpretable compositional language model. Its core innovation lies in the first explicit knowledge separation mechanism coupled with a lightweight reasoning framework, enabling zero hallucination, high transparency, and cross-domain transferability. Evaluated end-to-end on PubMedQA, Gyan-4.3 achieves 87.1% accuracy, substantially outperforming MedPrompt (82.0%) and Med-PaLM 2 (81.8%), establishing a new state-of-the-art. This work introduces a novel paradigm for trustworthy, efficient, and maintainable medical AI systems.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have shown significant abilities in retrieving medical knowledge, reasoning over it and answering medical questions comparably to physicians. However, these models are not interpretable, hallucinate, are difficult to maintain and require enormous compute resources for training and inference. In this paper, we report results from Gyan, an explainable language model based on an alternative architecture, on the PubmedQA data set. The Gyan LLM is a compositional language model and the model is decoupled from knowledge. Gyan is trustable, transparent, does not hallucinate and does not require significant training or compute resources. Gyan is easily transferable across domains. Gyan-4.3 achieves SOTA results on PubmedQA with 87.1% accuracy compared to 82% by MedPrompt based on GPT-4 and 81.8% by Med-PaLM 2 (Google and DeepMind). We will be reporting results for other medical data sets - MedQA, MedMCQA, MMLU - Medicine in the future.
Problem

Research questions and friction points this paper is trying to address.

LLMs lack interpretability and hallucinate in medical QA
Existing models require excessive compute resources for training
Current medical QA models are difficult to maintain and transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable language model with alternative architecture
Decoupled knowledge for trustable and transparent results
Achieves SOTA with minimal training resources