๐ค AI Summary
This paper addresses the challenges of identifying potential clinical safety vulnerabilities in large language models (LLMs) for healthcare and the lack of standardized evaluation protocols. We propose the first red-teaming framework co-designed and deeply informed by clinical domain experts. Methodologically, it integrates clinical-knowledge-guided adversarial prompting, multi-expert collaborative vulnerability annotation, cross-model consistency validation, and taxonomy-driven analysis. Systematically applied to real-world clinical scenarios, it uncovers and categorizes over ten classes of high-risk medical vulnerabilities, with reproducible validation across multiple mainstream LLMs. Key contributions include: (1) uncovering semantic-level medical vulnerabilities inaccessible to technical teams alone; (2) establishing a reproducible, extensible cross-model vulnerability assessment framework; and (3) releasing the first community-driven benchmark and practical guideline for evaluating medical LLM safetyโthereby advancing standardization in healthcare AI safety assessment.
๐ Abstract
We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large language model (LLM) outputs a response that could cause clinical harm. Red-teaming with clinicians enables the identification of LLM vulnerabilities that may not be recognised by LLM developers lacking clinical expertise. We report the vulnerabilities found, categorise them, and present the results of a replication study assessing the vulnerabilities across all LLMs provided.