Investigating LLMs in Clinical Triage: Promising Capabilities, Persistent Intersectional Biases

๐Ÿ“… 2025-04-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study systematically evaluates the robustness and fairness of large language models (LLMs) in emergency triage, focusing on distributional shift, missing data handling, and intersectional bias across gender and race. We propose a multi-strategy LLM triage evaluation framework integrating continual pretraining, in-context learning, and hybrid machine learning, augmented with counterfactual reasoning and robustness diagnostics. Our work is the first to empirically uncover significant intersectional bias in clinical triage: LLMs exhibit markedly reduced recommendation consistency for Black womenโ€”revealing implicit demographic preferences that may compromise real-world decision-making. Experiments demonstrate that LLMs outperform traditional models under data scarcity and distributional shift; however, their fairness deficiencies remain pronounced. This research provides critical empirical evidence and methodological foundations for robust deployment and bias mitigation of AI in healthcare.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) have shown promise in clinical decision support, yet their application to triage remains underexplored. We systematically investigate the capabilities of LLMs in emergency department triage through two key dimensions: (1) robustness to distribution shifts and missing data, and (2) counterfactual analysis of intersectional biases across sex and race. We assess multiple LLM-based approaches, ranging from continued pre-training to in-context learning, as well as machine learning approaches. Our results indicate that LLMs exhibit superior robustness, and we investigate the key factors contributing to the promising LLM-based approaches. Furthermore, in this setting, we identify gaps in LLM preferences that emerge in particular intersections of sex and race. LLMs generally exhibit sex-based differences, but they are most pronounced in certain racial groups. These findings suggest that LLMs encode demographic preferences that may emerge in specific clinical contexts or particular combinations of characteristics.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLM robustness in clinical triage scenarios
Investigating intersectional biases in LLMs across sex and race
Evaluating LLM performance under distribution shifts and missing data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Assessing LLMs in emergency triage robustness
Analyzing intersectional biases via counterfactual methods
Comparing pretraining and in-context learning approaches
๐Ÿ”Ž Similar Papers
No similar papers found.