MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

Current medical foundation models lack sufficient reliability in real-world clinical settings due to the absence of systematic robustness evaluation. This work introduces the first unified robustness benchmark for medical vision-language and segmentation foundation models, encompassing critical clinical tasks such as visual question answering, radiology report generation, and image segmentation. The framework incorporates diverse realistic perturbations—including adversarial attacks, domain shifts, and image degradations—to simulate non-ideal clinical conditions. Experimental results reveal significant performance fragility among state-of-the-art medical foundation models under these perturbations, highlighting their vulnerability in practical deployment scenarios. The proposed benchmark establishes a crucial reliability assessment standard and provides foundational insights for the safe and effective clinical translation of medical AI systems.

📝 Abstract

Medical foundation models (MedFMs) have emerged as transformative tools in healthcare, demonstrating capabilities across diverse clinical applications. These models can be broadly categorized into two paradigms: Medical Vision-Language Models (Med-VLMs) and segmentation foundation models. Med-VLMs range from medical-specialized models such as LLaVA-Med and MedGemma, to general-purpose models like GPT-4o and Gemini, all capable of medical image understanding tasks including visual question answering (VQA), report generation, and visual grounding. Concurrently, the Segment Anything Model (SAM) has catalyzed a new generation of medical segmentation models, with adaptations like SAM-Med2D and MedSAM. The widespread clinical deployment of these models thus necessitates rigorous evaluation of their reliability under real-world conditions.

Problem

Research questions and friction points this paper is trying to address.

Medical Foundation Models

Robustness

Benchmarking

Clinical Deployment

Reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Medical Foundation Models

Robustness Benchmarking

Vision-Language Models