FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

📅 2024-11-27

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Medical vision-language models (VLMs) frequently generate quantitative measurement hallucinations—e.g., erroneous endotracheal tube positioning—in chest X-ray report generation, undermining clinical reliability. To address this, we propose FactCheXcker, a modular verification framework that pioneers modeling radiological measurement validation as an executable Python code generation task. Given a VLM-generated report and rule-guided extraction of measurable findings, FactCheXcker leverages a large language model to synthesize numerical solving code, automatically correcting critical measurements and integrating corrections into the final report. Our approach requires no fine-tuning and enables plug-and-play hallucination mitigation. Evaluated on MIMIC-CXR across 11 state-of-the-art report generation models, FactCheXcker reduces mean absolute error (MAE) in quantitative measurements by 94.0% on average, substantially improving measurement accuracy while preserving linguistic quality and clinical readability.

Technology Category

Application Category

📝 Abstract

Medical vision-language model models often struggle with generating accurate quantitative measurements in radiology reports, leading to hallucinations that undermine clinical reliability. We introduce FactCheXcker, a modular framework that de-hallucinates radiology report measurements by leveraging an improved query-code-update paradigm. Specifically, FactCheXcker employs specialized modules and the code generation capabilities of large language models to solve measurement queries generated based on the original report. After extracting measurable findings, the results are incorporated into an updated report. We evaluate FactCheXcker on endotracheal tube placement, which accounts for an average of 78% of report measurements, using the MIMIC-CXR dataset and 11 medical report-generation models. Our results show that FactCheXcker significantly reduces hallucinations, improves measurement precision, and maintains the quality of the original reports. Specifically, FactCheXcker improves the performance of all 11 models and achieves an average improvement of 94.0% in reducing measurement hallucinations measured by mean absolute error.

Problem

Research questions and friction points this paper is trying to address.

Reducing measurement hallucinations in radiology reports

Improving accuracy of quantitative medical measurements

Enhancing clinical reliability of generated reports

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for radiology report de-hallucination

Query-code-update paradigm with LLM capabilities

Specialized modules for measurable findings extraction

🔎 Similar Papers

No similar papers found.