🤖 AI Summary
This work addresses the critical limitation of existing public CT datasets—namely, the absence of lesion-level annotations—which hinders advances in AI-driven lesion detection and radiology report generation. To bridge this gap, we introduce CT-Bench, the first benchmark dataset tailored for multimodal lesion understanding in CT imaging. It comprises 20,335 lesion images annotated with bounding boxes, descriptive captions, and size measurements, along with 2,850 multitask visual question answering (VQA) pairs covering localization, description, size estimation, and attribute classification, augmented with challenging negative samples to reflect real-world clinical complexity. Leveraging this dataset, we fine-tune vision–language models and medical CLIP variants, evaluating them systematically through multitask VQA. Experiments demonstrate substantial performance gains on two benchmark tasks, with model outputs showing strong agreement with radiologist assessments, thereby validating CT-Bench’s efficacy and clinical relevance.
📝 Abstract
Artificial intelligence (AI) can automatically delineate lesions on computed tomography (CT) and generate radiology report content, yet progress is limited by the scarcity of publicly available CT datasets with lesion-level annotations. To bridge this gap, we introduce CT-Bench, a first-of-its-kind benchmark dataset comprising two components: a Lesion Image and Metadata Set containing 20,335 lesions from 7,795 CT studies with bounding boxes, descriptions, and size information, and a multitask visual question answering benchmark with 2,850 QA pairs covering lesion localization, description, size estimation, and attribute categorization. Hard negative examples are included to reflect real-world diagnostic challenges. We evaluate multiple state-of-the-art multimodal models, including vision-language and medical CLIP variants, by comparing their performance to radiologist assessments, demonstrating the value of CT-Bench as a comprehensive benchmark for lesion analysis. Moreover, fine-tuning models on the Lesion Image and Metadata Set yields significant performance gains across both components, underscoring the clinical utility of CT-Bench.