🤖 AI Summary
Medical multimodal large language models (MLLMs) possess high intellectual property value due to data scarcity and stringent privacy constraints, yet remain vulnerable to black-box model stealing—particularly for complex tasks like radiology report generation, where existing classification-oriented attacks fail to generalize.
Method: We propose ADA-STEAL, the first black-box stealing framework tailored for radiology report generation MLLMs. It enables cross-domain functional replication without any medical data by leveraging publicly available natural images, adversarial domain alignment (ADA), black-box queries, cross-modal feature transfer, and adversarial noise injection.
Contribution/Results: Extensive experiments on IU X-RAY and MIMIC-CXR demonstrate that stolen models achieve radiology report quality nearly comparable to the original, effectively circumventing the medical data dependency bottleneck. ADA-STEAL establishes a novel paradigm for security evaluation of medical MLLMs, advancing both threat modeling and defense-aware development.
📝 Abstract
Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs. ADA-STEAL relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.