Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing X-ray report generation models often neglect fine-grained visual-semantic associations of critical lesion regions, leading to incomplete disease descriptions and insufficient clinical relevance. To address this, we propose a dual-Hopfield network framework that emulates the radiologist’s “image interpretation–memory recall–report writing” workflow: a visual Hopfield network models pixel-level correspondences between lesion regions and textual tokens, while a report Hopfield network retrieves diagnostic knowledge from historical reports. We further introduce a joint visual token activation mechanism driven by disease-specific query tokens and Class Activation Mapping (CAM) heatmaps. Our end-to-end framework integrates a classification backbone, CAM, dual Hopfield networks, and a large language model. Evaluated on IU X-ray, MIMIC-CXR, and CheXpert Plus, it achieves state-of-the-art performance, significantly improving disease term accuracy and clinical credibility. The code is publicly available.

Technology Category

Application Category

📝 Abstract
X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient in describing key diseases. In this paper, we propose a novel associative memory-enhanced X-ray report generation model that effectively mimics the process of professional doctors writing medical reports. It considers both the mining of global and local visual information and associates historical report information to better complete the writing of the current report. Specifically, given an X-ray image, we first utilize a classification model along with its activation maps to accomplish the mining of visual regions highly associated with diseases and the learning of disease query tokens. Then, we employ a visual Hopfield network to establish memory associations for disease-related tokens, and a report Hopfield network to retrieve report memory information. This process facilitates the generation of high-quality reports based on a large language model and achieves state-of-the-art performance on multiple benchmark datasets, including the IU X-ray, MIMIC-CXR, and Chexpert Plus. The source code of this work is released on url{https://github.com/Event-AHU/Medical_Image_Analysis}.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
X-ray Report Generation
Image Information Utilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

X-ray Image Analysis
Disease Feature Learning
Accurate Report Generation
🔎 Similar Papers
No similar papers found.
X
Xiao Wang
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China; Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, Hefei 230601, China; School of Computer Science and Technology, Anhui University, Hefei 230601, China
Fuling Wang
Fuling Wang
Anhui University
Medical Report Generation
H
Haowen Wang
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China; Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, Hefei 230601, China; School of Computer Science and Technology, Anhui University, Hefei 230601, China
B
Bowei Jiang
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei 230601, China; Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University, Hefei 230601, China; School of Computer Science and Technology, Anhui University, Hefei 230601, China
C
Chuanfu Li
First Affiliated Hospital of Anhui University of Chinese Medicine, Hefei 230022, China
Yaowei Wang
Yaowei Wang
The Hong Kong Polytechnic University
Y
Yonghong Tian
Peng Cheng Laboratory, Shenzhen, China, and National Engineering Laboratory for Video Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing, China
Jin Tang
Jin Tang
Anhui University
Computer visionintelligent video analysis