Investigating LLMs in Clinical Triage: Promising Capabilities, Persistent Intersectional Biases

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study systematically evaluates the robustness and fairness of large language models (LLMs) in emergency triage, focusing on distributional shift, missing data handling, and intersectional bias across gender and race. We propose a multi-strategy LLM triage evaluation framework integrating continual pretraining, in-context learning, and hybrid machine learning, augmented with counterfactual reasoning and robustness diagnostics. Our work is the first to empirically uncover significant intersectional bias in clinical triage: LLMs exhibit markedly reduced recommendation consistency for Black women—revealing implicit demographic preferences that may compromise real-world decision-making. Experiments demonstrate that LLMs outperform traditional models under data scarcity and distributional shift; however, their fairness deficiencies remain pronounced. This research provides critical empirical evidence and methodological foundations for robust deployment and bias mitigation of AI in healthcare.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown promise in clinical decision support, yet their application to triage remains underexplored. We systematically investigate the capabilities of LLMs in emergency department triage through two key dimensions: (1) robustness to distribution shifts and missing data, and (2) counterfactual analysis of intersectional biases across sex and race. We assess multiple LLM-based approaches, ranging from continued pre-training to in-context learning, as well as machine learning approaches. Our results indicate that LLMs exhibit superior robustness, and we investigate the key factors contributing to the promising LLM-based approaches. Furthermore, in this setting, we identify gaps in LLM preferences that emerge in particular intersections of sex and race. LLMs generally exhibit sex-based differences, but they are most pronounced in certain racial groups. These findings suggest that LLMs encode demographic preferences that may emerge in specific clinical contexts or particular combinations of characteristics.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM robustness in clinical triage scenarios

Investigating intersectional biases in LLMs across sex and race

Evaluating LLM performance under distribution shifts and missing data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Assessing LLMs in emergency triage robustness

Analyzing intersectional biases via counterfactual methods

Comparing pretraining and in-context learning approaches

🔎 Similar Papers

No similar papers found.