🤖 AI Summary
This study addresses the challenge of generating radiology-compliant, on-demand descriptions of pulmonary nodules from chest CT images tailored to clinicians’ specific concerns. For the first time, the structured morphological annotations in the LIDC-IDRI dataset are reformulated into a clinically oriented visual question answering (VQA) task. By integrating cropped CT regions, VQA modeling, and natural language generation techniques, the authors construct a high-quality pulmonary nodule VQA dataset and train a model capable of producing precise radiological findings in response to natural language queries. Experimental results demonstrate that the generated descriptions achieve a CIDEr score of 3.896 and exhibit strong agreement with reference reports on key morphological features, confirming the method’s effectiveness in enhancing the flexibility and clinical utility of human–AI diagnostic interaction.
📝 Abstract
Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images. In this study, we constructed a visual question answering (VQA) dataset from structured data in an open dataset and investigated an image-finding generation method for chest CT images, with the aim of enabling interactive diagnostic support that presents findings based on questions that reflect physicians'interests rather than fixed descriptions. In this study, chest CT images included in the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) datasets were used. Regions of interest surrounding the pulmonary nodules were extracted from these images, and image findings and questions were defined based on morphological characteristics recorded in the database. A dataset comprising pairs of cropped images, corresponding questions, and image findings was constructed, and the VQA model was fine-tuned on it. Language evaluation metrics such as BLEU were used to evaluate the generated image findings. The VQA dataset constructed using the proposed method contained image findings with natural expressions as radiological descriptions. In addition, the generated image findings showed a high CIDEr score of 3.896, and a high agreement with the reference findings was obtained through evaluation based on morphological characteristics. We constructed a VQA dataset for chest CT images using structured information on the morphological characteristics from the LIDC-IDRI dataset. Methods for generating image findings in response to these questions have also been investigated. Based on the generated results and evaluation metric scores, the proposed method was effective as an interactive diagnostic support system that can present image findings according to physicians'interests.