Exploring Membership Inference Vulnerabilities in Clinical Large Language Models

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Clinical large language models (LLMs) face non-negligible privacy risks from membership inference attacks (MIAs), yet their vulnerability in real-world healthcare settings remains underexplored. Method: We propose a clinical-semantic rewriting perturbation strategy—grounded in domain-specific knowledge—to better emulate realistic adversarial conditions, and conduct empirical MIAs based on loss metrics against the Llemr clinical QA model. Results: Our evaluation reveals limited but measurable membership information leakage in current clinical LLMs: while they exhibit partial robustness, substantive privacy vulnerabilities persist. This work pioneers the integration of clinical domain knowledge into perturbation design, enabling fine-grained, healthcare-aware privacy assessment. It establishes the first empirically grounded framework for evaluating privacy leakage in clinical LLMs and provides both methodological guidance and empirical evidence to inform the development of domain-specific defensive mechanisms.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) become progressively more embedded in clinical decision-support, documentation, and patient-information systems, ensuring their privacy and trustworthiness has emerged as an imperative challenge for the healthcare sector. Fine-tuning LLMs on sensitive electronic health record (EHR) data improves domain alignment but also raises the risk of exposing patient information through model behaviors. In this work-in-progress, we present an exploratory empirical study on membership inference vulnerabilities in clinical LLMs, focusing on whether adversaries can infer if specific patient records were used during model training. Using a state-of-the-art clinical question-answering model, Llemr, we evaluate both canonical loss-based attacks and a domain-motivated paraphrasing-based perturbation strategy that more realistically reflects clinical adversarial conditions. Our preliminary findings reveal limited but measurable membership leakage, suggesting that current clinical LLMs provide partial resistance yet remain susceptible to subtle privacy risks that could undermine trust in clinical AI adoption. These results motivate continued development of context-aware, domain-specific privacy evaluations and defenses such as differential privacy fine-tuning and paraphrase-aware training, to strengthen the security and trustworthiness of healthcare AI systems.

Problem

Research questions and friction points this paper is trying to address.

Assessing membership inference risks in clinical language models

Evaluating patient data leakage from model training behaviors

Identifying privacy vulnerabilities in healthcare AI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Membership inference attacks on clinical LLMs

Paraphrasing-based perturbation strategy evaluation

Differential privacy fine-tuning for healthcare AI

🔎 Similar Papers

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models