Red Teaming Large Language Models for Healthcare

๐Ÿ“… 2025-05-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the challenges of identifying potential clinical safety vulnerabilities in large language models (LLMs) for healthcare and the lack of standardized evaluation protocols. We propose the first red-teaming framework co-designed and deeply informed by clinical domain experts. Methodologically, it integrates clinical-knowledge-guided adversarial prompting, multi-expert collaborative vulnerability annotation, cross-model consistency validation, and taxonomy-driven analysis. Systematically applied to real-world clinical scenarios, it uncovers and categorizes over ten classes of high-risk medical vulnerabilities, with reproducible validation across multiple mainstream LLMs. Key contributions include: (1) uncovering semantic-level medical vulnerabilities inaccessible to technical teams alone; (2) establishing a reproducible, extensible cross-model vulnerability assessment framework; and (3) releasing the first community-driven benchmark and practical guideline for evaluating medical LLM safetyโ€”thereby advancing standardization in healthcare AI safety assessment.

Technology Category

Application Category

๐Ÿ“ Abstract
We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large language model (LLM) outputs a response that could cause clinical harm. Red-teaming with clinicians enables the identification of LLM vulnerabilities that may not be recognised by LLM developers lacking clinical expertise. We report the vulnerabilities found, categorise them, and present the results of a replication study assessing the vulnerabilities across all LLMs provided.
Problem

Research questions and friction points this paper is trying to address.

Identify vulnerabilities in healthcare LLM responses
Assess clinical harm risks from LLM outputs
Evaluate LLM safety gaps with clinician input
Innovation

Methods, ideas, or system contributions that make the work stand out.

Red-teaming LLMs with clinical expertise
Identifying harmful LLM vulnerabilities in healthcare
Categorizing and replicating found vulnerabilities
๐Ÿ”Ž Similar Papers
No similar papers found.
Vahid Balazadeh
Vahid Balazadeh
PhD in Computer Science - University of Toronto
Machine LearningCausality
Michael Cooper
Michael Cooper
PhD Student, University of Toronto; Abridge AI
Machine Learning for HealthcareMachine LearningArtificial Intelligence
David Pellow
David Pellow
University of Toronto
Machine learning for healthcareComputational Biology
A
Atousa Assadi
University of Toronto
J
Jennifer Bell
University Health Network
J
Jim Fackler
Johns Hopkins Medical Institutions
G
Gabriel Funingana
Cancer Research UK Cambridge Institute
S
Spencer Gable-Cook
University Health Network
A
Anirudh Gangadhar
University Health Network
A
Abhishek Jaiswal
S
Sumanth Kaja
C
Christopher Khoury
R
Randy Lin
Algoma University
K
Kaden McKeen
University of Toronto, Vector Institute for AI, University Health Network
S
Sara Naimimohasses
University of Iowa Hospitals & Clinics
K
Khashayar Namdar
University of Toronto
A
Aviraj Newatia
University of Toronto, Vector Institute for AI
A
Allan Pang
Leeds Teaching Hospitals NHS Trust
A
Anshul Pattoo
Queen's University
S
Sameer Peesapati
Synthesize
D
Diana Prepelita
University of Cambridge
B
Bogdana Rakova
S
Saba Sadatamin
University of Toronto
R
Rafael Schulman
A
Ajay Shah
S
Syed Azhar Shah
Syed Ahmar Shah
Syed Ahmar Shah
University of Edinburgh
Machine Learning for HealthcareData Science for HealthcareAI for HealthcareData Science in Respiratory Medicine
Babak Taati
Babak Taati
KITE Research Institute |Toronto Rehab - UHN & Department of Computer Science, University of Toronto
Computer VisionHealth MonitoringAmbient Intelligence
B
Balagopal Unnikrishnan
University of Toronto, Vector Institute for AI
S
Stephanie Williams
University Health Network
Rahul G. Krishnan
Rahul G. Krishnan
University of Toronto
Machine LearningArtificial IntelligenceHealthcareProbabilistic modelsCausal Inference