๐ค AI Summary
This study systematically evaluates the robustness and fairness of large language models (LLMs) in emergency triage, focusing on distributional shift, missing data handling, and intersectional bias across gender and race. We propose a multi-strategy LLM triage evaluation framework integrating continual pretraining, in-context learning, and hybrid machine learning, augmented with counterfactual reasoning and robustness diagnostics. Our work is the first to empirically uncover significant intersectional bias in clinical triage: LLMs exhibit markedly reduced recommendation consistency for Black womenโrevealing implicit demographic preferences that may compromise real-world decision-making. Experiments demonstrate that LLMs outperform traditional models under data scarcity and distributional shift; however, their fairness deficiencies remain pronounced. This research provides critical empirical evidence and methodological foundations for robust deployment and bias mitigation of AI in healthcare.
๐ Abstract
Large Language Models (LLMs) have shown promise in clinical decision support, yet their application to triage remains underexplored. We systematically investigate the capabilities of LLMs in emergency department triage through two key dimensions: (1) robustness to distribution shifts and missing data, and (2) counterfactual analysis of intersectional biases across sex and race. We assess multiple LLM-based approaches, ranging from continued pre-training to in-context learning, as well as machine learning approaches. Our results indicate that LLMs exhibit superior robustness, and we investigate the key factors contributing to the promising LLM-based approaches. Furthermore, in this setting, we identify gaps in LLM preferences that emerge in particular intersections of sex and race. LLMs generally exhibit sex-based differences, but they are most pronounced in certain racial groups. These findings suggest that LLMs encode demographic preferences that may emerge in specific clinical contexts or particular combinations of characteristics.