Statistics, Not Scale: Modular Medical Dialogue with Bayesian Belief Engine

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses a fundamental architectural flaw in current large language models (LLMs) for medical diagnosis, wherein natural language understanding is conflated with probabilistic reasoning. To resolve this, the authors propose the BMBE framework, which rigorously decouples linguistic processing from diagnostic inference: the LLM functions solely as a “language sensor” to parse patient inputs and generate structured queries, while all diagnostic reasoning is delegated to a separate, auditable Bayesian engine. This design confers multiple advantages, including enhanced privacy preservation, modular replaceability, selective diagnostic calibration, and robustness against adversarial phrasing. Experimental results demonstrate that BMBE significantly outperforms state-of-the-art LLMs on both real-world and synthetic knowledge bases, achieving higher diagnostic accuracy at lower computational cost—gains attributable to its disentangled architecture rather than additional information.

Technology Category

Application Category

📝 Abstract

Large language models are increasingly deployed as autonomous diagnostic agents, yet they conflate two fundamentally different capabilities: natural-language communication and probabilistic reasoning. We argue that this conflation is an architectural flaw, not an engineering shortcoming. We introduce BMBE (Bayesian Medical Belief Engine), a modular diagnostic dialogue framework that enforces a strict separation between language and reasoning: an LLM serves only as a sensor, parsing patient utterances into structured evidence and verbalising questions, while all diagnostic inference resides in a deterministic, auditable Bayesian engine. Because patient data never enters the LLM, the architecture is private by construction; because the statistical backend is a standalone module, it can be replaced per target population without retraining. This separation yields three properties no autonomous LLM can offer: calibrated selective diagnosis with a continuously adjustable accuracy-coverage tradeoff, a statistical separation gap where even a cheap sensor paired with the engine outperforms a frontier standalone model from the same family at a fraction of the cost, and robustness to adversarial patient communication styles that cause standalone doctors to collapse. We validate across empirical and LLM-generated knowledge bases against frontier LLMs, confirming the advantage is architectural, not informational.

Problem

Research questions and friction points this paper is trying to address.

medical dialogue

probabilistic reasoning

large language models

diagnostic agents

Bayesian inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

modular architecture

Bayesian inference

diagnostic reasoning