AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
This work addresses the underexplored medical AI task of automatic narrative mammography report generation. To tackle challenges including multi-view image reasoning, high-resolution visual understanding, and unstructured radiological language modeling, we propose the first end-to-end narrative report generation framework. We establish a reproducible large vision-language model (VLM) benchmark, employing parameter-efficient fine-tuning via LoRA on MedGemma-4B-it and unifying multiple VLM backbones on the DMID dataset. Experimental results demonstrate substantial improvements in both textual quality and clinical fidelity: ROUGE-L = 0.5691, METEOR = 0.6152, CIDEr = 0.5818, and BI-RADS classification accuracy = 0.5582, alongside effective hallucination suppression. Our core contribution is the first end-to-end framework enabling high-fidelity, interpretable, and clinically trustworthy narrative mammography report generation.

Technology Category

Application Category

📝 Abstract
Mammography report generation is a critical yet underexplored task in medical AI, characterized by challenges such as multiview image reasoning, high-resolution visual cues, and unstructured radiologic language. In this work, we introduce AMRG (Automatic Mammography Report Generation), the first end-to-end framework for generating narrative mammography reports using large vision-language models (VLMs). Building upon MedGemma-4B-it-a domain-specialized, instruction-tuned VLM-we employ a parameter-efficient fine-tuning (PEFT) strategy via Low-Rank Adaptation (LoRA), enabling lightweight adaptation with minimal computational overhead. We train and evaluate AMRG on DMID, a publicly available dataset of paired high-resolution mammograms and diagnostic reports. This work establishes the first reproducible benchmark for mammography report generation, addressing a longstanding gap in multimodal clinical AI. We systematically explore LoRA hyperparameter configurations and conduct comparative experiments across multiple VLM backbones, including both domain-specific and general-purpose models under a unified tuning protocol. Our framework demonstrates strong performance across both language generation and clinical metrics, achieving a ROUGE-L score of 0.5691, METEOR of 0.6152, CIDEr of 0.5818, and BI-RADS accuracy of 0.5582. Qualitative analysis further highlights improved diagnostic consistency and reduced hallucinations. AMRG offers a scalable and adaptable foundation for radiology report generation and paves the way for future research in multimodal medical AI.
Problem

Research questions and friction points this paper is trying to address.

Extend VLMs for mammography report generation
Address multiview image and high-resolution challenges
Establish reproducible benchmark for clinical AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MedGemma-4B-it VLM for mammography reports
Employs LoRA for lightweight parameter-efficient tuning
Trains on DMID dataset for high-resolution mammograms
Nak-Jun Sung
Nak-Jun Sung
National Cancer Center
Artificial IntelligenceMedical Image ProcessingComputer GraphicsPhysically based Simulation
D
Donghyun Lee
Research Institute, National Cancer Center Korea, 323, Ilsan-ro, Ilsandong-gu, Goyang-si, 10408, Gyeonggi-do, Republic of Korea
B
Bo Hwa Choi
Department of Radiology, National Cancer Center Korea, 323, Ilsan-ro, Ilsandong-gu, Goyang-si, 10408, Gyeonggi-do, Republic of Korea
C
Chae Jung Park
Research Institute, National Cancer Center Korea, 323, Ilsan-ro, Ilsandong-gu, Goyang-si, 10408, Gyeonggi-do, Republic of Korea