🤖 AI Summary
Existing X-ray report generation models often neglect fine-grained visual-semantic associations of critical lesion regions, leading to incomplete disease descriptions and insufficient clinical relevance. To address this, we propose a dual-Hopfield network framework that emulates the radiologist’s “image interpretation–memory recall–report writing” workflow: a visual Hopfield network models pixel-level correspondences between lesion regions and textual tokens, while a report Hopfield network retrieves diagnostic knowledge from historical reports. We further introduce a joint visual token activation mechanism driven by disease-specific query tokens and Class Activation Mapping (CAM) heatmaps. Our end-to-end framework integrates a classification backbone, CAM, dual Hopfield networks, and a large language model. Evaluated on IU X-ray, MIMIC-CXR, and CheXpert Plus, it achieves state-of-the-art performance, significantly improving disease term accuracy and clinical credibility. The code is publicly available.
📝 Abstract
X-ray image based medical report generation achieves significant progress in recent years with the help of the large language model, however, these models have not fully exploited the effective information in visual image regions, resulting in reports that are linguistically sound but insufficient in describing key diseases. In this paper, we propose a novel associative memory-enhanced X-ray report generation model that effectively mimics the process of professional doctors writing medical reports. It considers both the mining of global and local visual information and associates historical report information to better complete the writing of the current report. Specifically, given an X-ray image, we first utilize a classification model along with its activation maps to accomplish the mining of visual regions highly associated with diseases and the learning of disease query tokens. Then, we employ a visual Hopfield network to establish memory associations for disease-related tokens, and a report Hopfield network to retrieve report memory information. This process facilitates the generation of high-quality reports based on a large language model and achieves state-of-the-art performance on multiple benchmark datasets, including the IU X-ray, MIMIC-CXR, and Chexpert Plus. The source code of this work is released on url{https://github.com/Event-AHU/Medical_Image_Analysis}.