Contextual Evaluation of Large Language Models for Classifying Tropical and Infectious Diseases

📅 2024-09-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the performance and interpretability of large language models (LLMs) in tropical and infectious disease (TID) diagnosis. To address the scarcity of domain-specific, context-rich evaluation data, we construct and expand the open-source TRINDs dataset to over 11,000 question-answer pairs, each annotated with demographic (age, sex, geography), clinical, and risk-factor context. We propose a multi-model comparative evaluation framework benchmarked against expert clinician judgments to systematically assess classification accuracy of both general-purpose and medical LLMs. Our analysis provides the first empirical evidence that demographic and geographic context significantly influences LLM diagnostic outputs, with context injection improving average accuracy by 12.3%. Furthermore, we develop TRINDs-LM—a prototype tool enabling context-sensitive response visualization and attribution analysis—thereby establishing a novel paradigm for advancing controllability, interpretability, and fairness in clinical LLM applications.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific exploration. We build on an opensource tropical and infectious diseases (TRINDs) dataset, expanding it to include demographic and semantic clinical and consumer augmentations yielding 11000+ prompts. We evaluate LLM performance on these, comparing generalist and medical LLMs, as well as LLM outcomes to human experts. We demonstrate through systematic experimentation, the benefit of contextual information such as demographics, location, gender, risk factors for optimal LLM response. Finally we develop a prototype of TRINDs-LM, a research tool that provides a playground to navigate how context impacts LLM outputs for health.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Tropical Diseases
Infectious Diseases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Patient-specific Information
TRINDs-LM Tool
🔎 Similar Papers
No similar papers found.
M
M. Asiedu
Google Research
N
Nenad Tomašev
Google DeepMind
C
Chintan Ghate
Google Research
T
Tiya Tiyasirichokchai
Google Research
A
Awa Dieng
Google DeepMind
Oluwatosin Akande
Oluwatosin Akande
Graduate Student of Industrial and Systems Engineering, Lehigh University
PDE OptimizationMachine LearningScientific Computing
G
Geoffrey Siwo
University of Michigan
S
Steve Adudans
Gear Health
S
Sylvanus Aitkins
Ministry of Health, Sierra Leone
O
Odianosen Ehiakhamen
Nigerian Center for Disease Control
E
Eric Ndombi
Kenyatta University
Katherine Heller
Katherine Heller
Google Research
Machine LearningHealth AIEthical AI