🤖 AI Summary
Current radiology report generation (RRG) methods suffer from inadequate accuracy in lesion attribute description and poor result interpretability, undermining clinical trust. To address this, we propose the Chain of Diagnosis (CoD) framework: it employs a question-answering–driven mechanism to extract salient imaging findings and leverages large language models for report generation; introduces dual grounding modules—diagnostic grounding and lesion grounding—to explicitly align textual descriptions with corresponding image regions, enhancing interpretability; and supports few-label cross-dataset training. We construct the first fully annotated RRG dataset containing both question-answer pairs and lesion bounding boxes, and incorporate clinical consistency constraints into training. Experiments demonstrate that CoD significantly outperforms both domain-specific and general-purpose baselines on two benchmarks, achieving state-of-the-art performance in report accuracy, lesion attribute completeness, and localization interpretability.
📝 Abstract
Despite the progress of radiology report generation (RRG), existing works face two challenges: 1) The performances in clinical efficacy are unsatisfactory, especially for lesion attributes description; 2) the generated text lacks explainability, making it difficult for radiologists to trust the results. To address the challenges, we focus on a trustworthy RRG model, which not only generates accurate descriptions of abnormalities, but also provides basis of its predictions. To this end, we propose a framework named chain of diagnosis (CoD), which maintains a chain of diagnostic process for clinically accurate and explainable RRG. It first generates question-answer (QA) pairs via diagnostic conversation to extract key findings, then prompts a large language model with QA diagnoses for accurate generation. To enhance explainability, a diagnosis grounding module is designed to match QA diagnoses and generated sentences, where the diagnoses act as a reference. Moreover, a lesion grounding module is designed to locate abnormalities in the image, further improving the working efficiency of radiologists. To facilitate label-efficient training, we propose an omni-supervised learning strategy with clinical consistency to leverage various types of annotations from different datasets. Our efforts lead to 1) an omni-labeled RRG dataset with QA pairs and lesion boxes; 2) a evaluation tool for assessing the accuracy of reports in describing lesion location and severity; 3) extensive experiments to demonstrate the effectiveness of CoD, where it outperforms both specialist and generalist models consistently on two RRG benchmarks and shows promising explainability by accurately grounding generated sentences to QA diagnoses and images.