When Minor Edits Matter: LLM-Driven Prompt Attack for Medical VLM Robustness in Ultrasound

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the critical vulnerability of medical vision-language models (Med-VLMs) to minor perturbations in natural language prompts during ultrasound image analysis, which poses a serious threat to their clinical reliability. We propose the first adversarial evaluation framework that integrates human-like rewriting with minimal-edit strategies, leveraging large language models to generate clinically plausible prompts that are semantically similar yet subtly altered. Using this framework, we systematically assess the robustness of state-of-the-art Med-VLMs on multiple-choice ultrasound question-answering tasks. Our experiments reveal widespread fragility: even minimal prompt perturbations significantly degrade model performance, attack success correlates positively with the attacker LLM’s capability, and model confidence often misaligns with incorrect predictions. This study establishes a new benchmark and analytical perspective for improving the stability of Med-VLMs in clinical deployment.

Technology Category

Application Category

📝 Abstract

Ultrasound is widely used in clinical practice due to its portability, cost-effectiveness, safety, and real-time imaging capabilities. However, image acquisition and interpretation remain highly operator dependent, motivating the development of robust AI-assisted analysis methods. Vision-language models (VLMs) have recently demonstrated strong multimodal reasoning capabilities and competitive performance in medical image analysis, including ultrasound. However, emerging evidence highlights significant concerns about their trustworthiness. In particular, adversarial robustness is critical because Med-VLMs operate via natural-language instructions, rendering prompt formulation a realistic and practically exploitable point of vulnerability. Small variations (typos, shorthand, underspecified requests, or ambiguous wording) can meaningfully shift model outputs. We propose a scalable adversarial evaluation framework that leverages a large language model (LLM) to generate clinically plausible adversarial prompt variants via "humanized" rewrites and minimal edits that mimic routine clinical communication. Using ultrasound multiple-choice question answering benchmarks, we systematically assess the vulnerability of SOTA Med-VLMs to these attacks, examine how attacker LLM capacity influences attack success, analyze the relationship between attack success and model confidence, and identify consistent failure patterns across models. Our results highlight realistic robustness gaps that must be addressed for safe clinical translation. Code will be released publicly following the review process.

Problem

Research questions and friction points this paper is trying to address.

adversarial robustness

vision-language models

medical ultrasound

prompt vulnerability

LLM-driven attack

Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial prompt attack

vision-language models

medical ultrasound