AMRG: Extend Vision Language Models for Automatic Mammography Report Generation

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the underexplored medical AI task of automatic narrative mammography report generation. To tackle challenges including multi-view image reasoning, high-resolution visual understanding, and unstructured radiological language modeling, we propose the first end-to-end narrative report generation framework. We establish a reproducible large vision-language model (VLM) benchmark, employing parameter-efficient fine-tuning via LoRA on MedGemma-4B-it and unifying multiple VLM backbones on the DMID dataset. Experimental results demonstrate substantial improvements in both textual quality and clinical fidelity: ROUGE-L = 0.5691, METEOR = 0.6152, CIDEr = 0.5818, and BI-RADS classification accuracy = 0.5582, alongside effective hallucination suppression. Our core contribution is the first end-to-end framework enabling high-fidelity, interpretable, and clinically trustworthy narrative mammography report generation.

Technology Category

Application Category

📝 Abstract

Mammography report generation is a critical yet underexplored task in medical AI, characterized by challenges such as multiview image reasoning, high-resolution visual cues, and unstructured radiologic language. In this work, we introduce AMRG (Automatic Mammography Report Generation), the first end-to-end framework for generating narrative mammography reports using large vision-language models (VLMs). Building upon MedGemma-4B-it-a domain-specialized, instruction-tuned VLM-we employ a parameter-efficient fine-tuning (PEFT) strategy via Low-Rank Adaptation (LoRA), enabling lightweight adaptation with minimal computational overhead. We train and evaluate AMRG on DMID, a publicly available dataset of paired high-resolution mammograms and diagnostic reports. This work establishes the first reproducible benchmark for mammography report generation, addressing a longstanding gap in multimodal clinical AI. We systematically explore LoRA hyperparameter configurations and conduct comparative experiments across multiple VLM backbones, including both domain-specific and general-purpose models under a unified tuning protocol. Our framework demonstrates strong performance across both language generation and clinical metrics, achieving a ROUGE-L score of 0.5691, METEOR of 0.6152, CIDEr of 0.5818, and BI-RADS accuracy of 0.5582. Qualitative analysis further highlights improved diagnostic consistency and reduced hallucinations. AMRG offers a scalable and adaptable foundation for radiology report generation and paves the way for future research in multimodal medical AI.

Problem

Research questions and friction points this paper is trying to address.

Extend VLMs for mammography report generation

Address multiview image and high-resolution challenges

Establish reproducible benchmark for clinical AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MedGemma-4B-it VLM for mammography reports

Employs LoRA for lightweight parameter-efficient tuning

Trains on DMID dataset for high-resolution mammograms

🔎 Similar Papers

Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography