Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Current large language models (LLMs) for healthcare lack standardized, multi-stakeholder safety evaluation frameworks. Method: We propose the first multi-perspective red-teaming protocol—assessing risks from patient, clinician, and general-user viewpoints—and introduce PatientSafetyBench, a benchmark comprising 466 real-world scenarios spanning five safety risk categories. Our approach integrates expert human judgment with automated adversarial attacks to enable systematic, quantitative safety evaluation of medical LLMs. Results: Empirical evaluation on the MediPhi model series demonstrates the framework’s effectiveness in uncovering distinct, role-specific risk patterns across user groups. This work bridges a critical gap in fine-grained, role-aware safety assessment for healthcare AI and establishes a reproducible, scalable, and customizable evaluation paradigm to support trustworthy deployment of medical LLMs.

Technology Category

Application Category

📝 Abstract

As the performance of large language models (LLMs) continues to advance, their adoption is expanding across a wide range of domains, including the medical field. The integration of LLMs into medical applications raises critical safety concerns, particularly due to their use by users with diverse roles, e.g. patients and clinicians, and the potential for model's outputs to directly affect human health. Despite the domain-specific capabilities of medical LLMs, prior safety evaluations have largely focused only on general safety benchmarks. In this paper, we introduce a safety evaluation protocol tailored to the medical domain in both patient user and clinician user perspectives, alongside general safety assessments and quantitatively analyze the safety of medical LLMs. We bridge a gap in the literature by building the PatientSafetyBench containing 466 samples over 5 critical categories to measure safety from the perspective of the patient. We apply our red-teaming protocols on the MediPhi model collection as a case study. To our knowledge, this is the first work to define safety evaluation criteria for medical LLMs through targeted red-teaming taking three different points of view - patient, clinician, and general user - establishing a foundation for safer deployment in medical domains.

Problem

Research questions and friction points this paper is trying to address.

Evaluating safety of medical LLMs from patient perspectives

Assessing clinician-focused safety risks in medical LLM outputs

Establishing multi-view safety benchmarks for healthcare AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tailored medical safety evaluation protocol

PatientSafetyBench with 466 critical samples

Red-teaming from patient, clinician, general views

🔎 Similar Papers

No similar papers found.