🤖 AI Summary
Radiologists face significant challenges in accurately detecting and characterizing subcentimeter abdominal tumors (<2 cm) on CT, including precise localization, size estimation, morphological assessment, density quantification, and spatial relationship analysis. To address this, we introduce the first publicly available million-scale 3D medical imaging–text tumor dataset—comprising 9,262 abdominal CT scans with 2,947 annotated tumors—and propose an anatomy-aware vision-language AI agent. Our method integrates joint tumor-organ segmentation, multi-granularity vision–language alignment, and radiology-knowledge-enhanced generative modeling to support voxel-level anatomical localization (liver/pancreas octant/tertile), pancreatic T-stage classification, and independent multi-lesion characterization. Validated through iterative clinical expert review, the model achieves 92% sensitivity and 78% specificity for subcentimeter tumor detection on unseen institutional data, generating 1.8M text tokens and 2.7M images covering 8,562 tumor instances—including 948 early-stage lesions.
📝 Abstract
With over 85 million CT scans performed annually in the United States, creating tumor-related reports is a challenging and time-consuming task for radiologists. To address this need, we present RadGPT, an Anatomy-Aware Vision-Language AI Agent for generating detailed reports from CT scans. RadGPT first segments tumors, including benign cysts and malignant tumors, and their surrounding anatomical structures, then transforms this information into both structured reports and narrative reports. These reports provide tumor size, shape, location, attenuation, volume, and interactions with surrounding blood vessels and organs. Extensive evaluation on unseen hospitals shows that RadGPT can produce accurate reports, with high sensitivity/specificity for small tumor (<2 cm) detection: 80/73% for liver tumors, 92/78% for kidney tumors, and 77/77% for pancreatic tumors. For large tumors, sensitivity ranges from 89% to 97%. The results significantly surpass the state-of-the-art in abdominal CT report generation. RadGPT generated reports for 17 public datasets. Through radiologist review and refinement, we have ensured the reports' accuracy, and created the first publicly available image-text 3D medical dataset, comprising over 1.8 million text tokens and 2.7 million images from 9,262 CT scans, including 2,947 tumor scans/reports of 8,562 tumor instances. Our reports can: (1) localize tumors in eight liver sub-segments and three pancreatic sub-segments annotated per-voxel; (2) determine pancreatic tumor stage (T1-T4) in 260 reports; and (3) present individual analyses of multiple tumors--rare in human-made reports. Importantly, 948 of the reports are for early-stage tumors.