Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Large language models (LLMs) struggle with accurate rare-disease diagnosis in primary care due to data scarcity and inherent biases toward common conditions. Method: This paper proposes RareScale—a novel framework that synergistically integrates expert systems with black-box LLMs (e.g., GPT-4o) to synthesize high-quality, rare-disease–specific consultation dialogues; constructs a lightweight candidate predictor to supply LLMs with precise prior inputs; and designs a multi-stage diagnostic pipeline unifying rule-based reasoning and supervised fine-tuning to dynamically balance rare- and common-disease diagnostic capabilities. Contributions/Results: Evaluated on 575 rare diseases, RareScale achieves an 17.1% absolute improvement in Top-5 diagnostic accuracy and 88.8% candidate generation accuracy—significantly surpassing pure LLM baselines and overcoming key performance bottlenecks in long-tail disease recognition.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated impressive capabilities in disease diagnosis. However, their effectiveness in identifying rarer diseases, which are inherently more challenging to diagnose, remains an open question. Rare disease performance is critical with the increasing use of LLMs in healthcare settings. This is especially true if a primary care physician needs to make a rarer prognosis from only a patient conversation so that they can take the appropriate next step. To that end, several clinical decision support systems are designed to support providers in rare disease identification. Yet their utility is limited due to their lack of knowledge of common disorders and difficulty of use. In this paper, we propose RareScale to combine the knowledge LLMs with expert systems. We use jointly use an expert system and LLM to simulate rare disease chats. This data is used to train a rare disease candidate predictor model. Candidates from this smaller model are then used as additional inputs to black-box LLM to make the final differential diagnosis. Thus, RareScale allows for a balance between rare and common diagnoses. We present results on over 575 rare diseases, beginning with Abdominal Actinomycosis and ending with Wilson's Disease. Our approach significantly improves the baseline performance of black-box LLMs by over 17% in Top-5 accuracy. We also find that our candidate generation performance is high (e.g. 88.8% on gpt-4o generated chats).

Problem

Research questions and friction points this paper is trying to address.

Improves rare disease diagnosis accuracy

Combines LLMs with expert systems

Enhances differential diagnosis performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with expert systems

Simulates rare disease chats

Improves LLM diagnosis accuracy

🔎 Similar Papers

Large Language Models for Disease Diagnosis: A Scoping Review