A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Assessing safety risks of medical large language models (LLMs) in clinical decision-making faces high barriers: reliance on GPU clusters, commercial APIs, or sensitive real-world health data impedes broad community participation. To address this, we propose MedSafe—the first lightweight, fully reproducible safety evaluation framework for medical LLMs, executable on consumer-grade CPUs alone. MedSafe supports multi-specialty scenarios including emergency medicine, psychiatry, and primary care. It employs synthetically generated clinical notes—exempt from IRB approval—and introduces standardized test protocols for jailbreaking attacks (e.g., role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. Furthermore, it establishes a cross-specialty comparable safety scoring system. All components are open-sourced, leveraging only free, publicly available models and data. MedSafe enables zero-cost, highly reproducible, and scalable benchmarking of medical AI safety.

Technology Category

Application Category

📝 Abstract
Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluates medical AI security vulnerabilities across clinical specialties
Assesses jailbreaking and privacy attacks using synthetic patient data
Provides reproducible framework for resource-constrained researchers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework uses synthetic patient records for privacy
Runs on consumer CPU hardware with free models
Covers jailbreaking and privacy attacks across specialties
🔎 Similar Papers