On the Performance of an Explainable Language Model on PubMedQA

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address critical limitations of large language models (LLMs) in medical question answering—including poor interpretability, hallucination propensity, maintenance difficulty, and high computational cost—this paper proposes Gyan, a knowledge-decoupled, interpretable compositional language model. Its core innovation lies in the first explicit knowledge separation mechanism coupled with a lightweight reasoning framework, enabling zero hallucination, high transparency, and cross-domain transferability. Evaluated end-to-end on PubMedQA, Gyan-4.3 achieves 87.1% accuracy, substantially outperforming MedPrompt (82.0%) and Med-PaLM 2 (81.8%), establishing a new state-of-the-art. This work introduces a novel paradigm for trustworthy, efficient, and maintainable medical AI systems.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown significant abilities in retrieving medical knowledge, reasoning over it and answering medical questions comparably to physicians. However, these models are not interpretable, hallucinate, are difficult to maintain and require enormous compute resources for training and inference. In this paper, we report results from Gyan, an explainable language model based on an alternative architecture, on the PubmedQA data set. The Gyan LLM is a compositional language model and the model is decoupled from knowledge. Gyan is trustable, transparent, does not hallucinate and does not require significant training or compute resources. Gyan is easily transferable across domains. Gyan-4.3 achieves SOTA results on PubmedQA with 87.1% accuracy compared to 82% by MedPrompt based on GPT-4 and 81.8% by Med-PaLM 2 (Google and DeepMind). We will be reporting results for other medical data sets - MedQA, MedMCQA, MMLU - Medicine in the future.

Problem

Research questions and friction points this paper is trying to address.

LLMs lack interpretability and hallucinate in medical QA

Existing models require excessive compute resources for training

Current medical QA models are difficult to maintain and transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable language model with alternative architecture

Decoupled knowledge for trustable and transparent results

Achieves SOTA with minimal training resources

🔎 Similar Papers

Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification