LLM-Bootstrapped Targeted Finding Guidance for Factual MLLM-based Medical Report Generation

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of factual inconsistency—such as lesion omission or hallucination—that hinders the clinical deployment of multimodal large language models (MLLMs) in medical report generation. To mitigate this issue, the authors propose Fact-Flow, a framework that decouples visual fact extraction from text generation: it first identifies structured clinical findings from medical images and then uses these findings to guide the MLLM in producing factually accurate reports. Notably, the framework leverages a large language model to automatically construct a labeled dataset of medical findings, circumventing the need for costly manual annotation. Evaluated on two disease-specific datasets, Fact-Flow significantly improves factual accuracy while maintaining high-quality narrative generation, outperforming current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
The automatic generation of medical reports utilizing Multimodal Large Language Models (MLLMs) frequently encounters challenges related to factual instability, which may manifest as the omission of findings or the incorporation of inaccurate information, thereby constraining their applicability in clinical settings. Current methodologies typically produce reports based directly on image features, which inherently lack a definitive factual basis. In response to this limitation, we introduce Fact-Flow, an innovative framework that separates the process of visual fact identification from the generation of reports. This is achieved by initially predicting clinical findings from the image, which subsequently directs the MLLM to produce a report that is factually precise. A pivotal advancement of our approach is a pipeline that leverages a Large Language Model (LLM) to autonomously create a dataset of labeled medical findings, effectively eliminating the need for expensive manual annotation. Extensive experimental evaluations conducted on two disease-focused medical datasets validate the efficacy of our method, demonstrating a significant enhancement in factual accuracy compared to state-of-the-art models, while concurrently preserving high standards of text quality.
Problem

Research questions and friction points this paper is trying to address.

factual instability
medical report generation
Multimodal Large Language Models
clinical findings
factually inaccurate information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fact-Flow
factual accuracy
LLM-bootstrapped annotation
medical report generation
multimodal large language models
🔎 Similar Papers
No similar papers found.
C
Cunyuan Yang
Zhejiang University
D
Dejuan Song
The Second Affiliated Hospital Zhejiang University School of Medicine
X
Xiaotao Pang
Hangzhou Pu Jian Medical Technology Co., Ltd.
Qianqian Shen
Qianqian Shen
Zhejiang University
Medical Image AnalysisComputer Vision
W
Wenjie Nie
Zhejiang University
Y
Yifan Huang
Zhejiang University
Lei Wu
Lei Wu
Zhejiang University
Blockchain SecuritySystem Security
W
Wei Han
The Second Affiliated Hospital Zhejiang University School of Medicine
Haishuai Wang
Haishuai Wang
Harvard University
Data MiningMachine Learning
J
Jiajun Bu
Zhejiang University