MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models

πŸ“… 2026-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing medical vision-language models, which are predominantly trained on specialized literature and struggle to explain imaging findings to patients in lay terms, while also lacking multimodal evaluation benchmarks that support semantic alignment between clinicians and non-experts. To bridge this gap, the authors introduce MedLayBench-Vβ€”the first large-scale multimodal benchmark designed explicitly for aligning expert and layperson semantics in medical contexts. By integrating Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) with fine-grained entity constraints, they develop a Structured Concept-Grounded Refinement (SCGR) pipeline that generates hallucination-free, semantically equivalent lay descriptions, ensuring strict consistency between professional and patient-oriented language. This benchmark provides a high-quality, verifiable foundation for training and evaluating medical vision-language models tailored for patient communication.
πŸ“ Abstract
Medical Vision-Language Models (Med-VLMs) have achieved expert-level proficiency in interpreting diagnostic imaging. However, current models are predominantly trained on professional literature, limiting their ability to communicate findings in the lay register required for patient-centered care. While text-centric research has actively developed resources for simplifying medical jargon, there is a critical absence of large-scale multimodal benchmarks designed to facilitate lay-accessible medical image understanding. To bridge this resource gap, we introduce MedLayBench-V, the first large-scale multimodal benchmark dedicated to expert-lay semantic alignment. Unlike naive simplification approaches that risk hallucination, our dataset is constructed via a Structured Concept-Grounded Refinement (SCGR) pipeline. This method enforces strict semantic equivalence by integrating Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) with micro-level entity constraints. MedLayBench-V provides a verified foundation for training and evaluating next-generation Med-VLMs capable of bridging the communication divide between clinical experts and patients.
Problem

Research questions and friction points this paper is trying to address.

Medical Vision-Language Models
lay language
semantic alignment
multimodal benchmark
patient communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

expert-lay alignment
medical vision-language models
semantic equivalence
UMLS CUIs
multimodal benchmark
πŸ”Ž Similar Papers
No similar papers found.
H
Han Jang
Seoul National University, Department of Radiology, Seoul National University Hospital, The Advanced Imaging and Computational Neuroimaging (AICON) Laboratory
Junhyeok Lee
Junhyeok Lee
Johns Hopkins University, Center for Language and Signal Processing
Speech and Language ProcessingSpeech ProcessingSpeech SynthesisGenerative Model
H
Heeseong Eum
Seoul National University, Seoul National University College of Medicine, The Advanced Imaging and Computational Neuroimaging (AICON) Laboratory
Kyu Sung Choi
Kyu Sung Choi
Assistant Professor, Department of Radiology, Seoul National University Hospital
RadiologyNeuroimageDeep LearningNeuro-Oncology