🤖 AI Summary
Generative AI models for automatic radiology report generation frequently exhibit “hallucinations,” producing clinically implausible findings that compromise diagnostic safety. To address this, we propose the first image-aligned true/false sentence dataset specifically designed for radiology reports and introduce a cross-modal (image–text) fine-grained fact-checking framework. Our approach integrates CLIP-style multimodal encoding, contrastive learning, and a binary discriminative head; it employs controlled perturbations of ground-truth reports to generate synthetic false sentences, enabling weakly supervised training without manual annotation. The framework supports real-time, sentence-level self-validation for generative models, verifying factual consistency against corresponding medical images. Evaluated on clinical radiology reports, our method achieves 92.3% accuracy in sentence-level veracity classification—marking a substantial improvement in the clinical reliability and safety of AI-assisted diagnosis.
📝 Abstract
With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows.