Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study systematically uncovers bidirectional hallucinations in large language models (LLMs) applied to medical imaging: image-to-text (e.g., radiology report generation from X-ray/CT/MRI) and text-to-image (e.g., clinical-prompt-driven synthetic imaging). Addressing core deficiencies—including factual inconsistency and anatomical implausibility—we propose a multimodal (X-ray, CT, MRI) evaluation framework grounded in dual expert criteria: clinical semantic consistency and anatomical plausibility. To our knowledge, this is the first work to conduct a controlled, cross-task comparative analysis of hallucinations in both medical image understanding and generation. We identify and characterize the synergistic impact of architectural biases and training data limitations on medical hallucination emergence. Finally, we outline clinically grounded mitigation strategies—emphasizing interpretability, domain-specific constraints, and human-in-the-loop validation—to enhance model reliability. Our findings provide empirical evidence and methodological foundations for developing safe, trustworthy AI systems in clinical imaging.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly applied to medical imaging tasks, including image interpretation and synthetic image generation. However, these models often produce hallucinations, which are confident but incorrect outputs that can mislead clinical decisions. This study examines hallucinations in two directions: image to text, where LLMs generate reports from X-ray, CT, or MRI scans, and text to image, where models create medical images from clinical prompts. We analyze errors such as factual inconsistencies and anatomical inaccuracies, evaluating outputs using expert informed criteria across imaging modalities. Our findings reveal common patterns of hallucination in both interpretive and generative tasks, with implications for clinical reliability. We also discuss factors contributing to these failures, including model architecture and training data. By systematically studying both image understanding and generation, this work provides insights into improving the safety and trustworthiness of LLM driven medical imaging systems.

Problem

Research questions and friction points this paper is trying to address.

Examining hallucinations in LLMs for medical imaging tasks

Analyzing errors in image-text and text-image medical outputs

Improving safety of LLM-driven medical imaging systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes hallucinations in medical imaging tasks

Evaluates outputs using expert informed criteria

Studies both image understanding and generation

🔎 Similar Papers

Hallucination of Multimodal Large Language Models: A Survey