ReXGroundingCT: A 3D Chest CT Dataset for Segmentation of Findings from Free-Text Reports

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge of aligning free-text radiology reports with precise anatomical locations in 3D CT volumes. To this end, we introduce Rad3D—the first publicly available 3D chest CT dataset comprising 3,142 non-contrast CT scans. Annotations were generated via GPT-4–assisted textual discovery, expert-guided 3D segmentation, and triple-layer quality control by board-certified radiologists, yielding pixel-level 3D segmentations for 8,028 clinical findings and 16,301 anatomical entities. Rad3D enables sentence-level semantic grounding of free-text descriptions to 3D lesions—surpassing conventional structured-label paradigms—and establishes the first benchmark for clinical language–3D imaging alignment. The dataset includes multiple segmentation variants and standardized train/val/test splits. It substantially advances cross-modal research in radiology report generation, lesion localization, and multimodal medical understanding.

Technology Category

Application Category

📝 Abstract
We present ReXGroundingCT, the first publicly available dataset to link free-text radiology findings with pixel-level segmentations in 3D chest CT scans that is manually annotated. While prior datasets have relied on structured labels or predefined categories, ReXGroundingCT captures the full expressiveness of clinical language represented in free text and grounds it to spatially localized 3D segmentation annotations in volumetric imaging. This addresses a critical gap in medical AI: the ability to connect complex, descriptive text, such as "3 mm nodule in the left lower lobe", to its precise anatomical location in three-dimensional space, a capability essential for grounded radiology report generation systems. The dataset comprises 3,142 non-contrast chest CT scans paired with standardized radiology reports from the CT-RATE dataset. Using a systematic three-stage pipeline, GPT-4 was used to extract positive lung and pleural findings, which were then manually segmented by expert annotators. A total of 8,028 findings across 16,301 entities were annotated, with quality control performed by board-certified radiologists. Approximately 79% of findings are focal abnormalities, while 21% are non-focal. The training set includes up to three representative segmentations per finding, while the validation and test sets contain exhaustive labels for each finding entity. ReXGroundingCT establishes a new benchmark for developing and evaluating sentence-level grounding and free-text medical segmentation models in chest CT. The dataset can be accessed at https://huggingface.co/datasets/rajpurkarlab/ReXGroundingCT.
Problem

Research questions and friction points this paper is trying to address.

Links free-text radiology findings to 3D CT scan segmentations
Connects descriptive clinical text to precise anatomical locations
Provides benchmark for medical AI segmentation and grounding models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Links free-text reports to 3D CT segmentations
Uses GPT-4 for automated finding extraction
Manual expert annotations with radiologist QC
🔎 Similar Papers
No similar papers found.
Mohammed Baharoon
Mohammed Baharoon
Harvard Medical School
Computer VisionMultimodal LearningUnsupervised LearninigFoundation Models
L
Luyang Luo
Department of Biomedical Informatics, Harvard Medical School, Boston, MA
M
Michael Moritz
SSM Health, St. Louis, MO
A
Abhinav Kumar
Icahn School of Medicine at Mount Sinai, New York, NY
S
Sung Eun Kim
Department of Biomedical Informatics, Harvard Medical School, Boston, MA; National Strategic Technology Research Institute, Seoul National University Hospital, South Korea
Xiaoman Zhang
Xiaoman Zhang
Harvard University
AI for MedicineMedical Image Analysis
M
Miao Zhu
Brigham and Women’s Hospital, Boston, MA
M
Mahmoud Hussain Alabbad
Chest Radiology Division, Medical Imaging Department, King Abdullah Specialized Children’s Hospital, Riyadh, Saudi Arabia; King Abdulaziz Medical City, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
M
Maha Sbayel Alhazmi
Chest Radiology Division, Medical Imaging Department, King Abdullah Specialized Children’s Hospital, Riyadh, Saudi Arabia; King Abdulaziz Medical City, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
N
Neel P. Mistry
Department of Medical Imaging, Royal University Hospital, Saskatoon, SK, Canada
K
Kent Ryan Kleinschmidt
Saint Louis University School of Medicine, St. Louis, MO
B
Brady Chrisler
Saint Louis University School of Medicine, St. Louis, MO
S
Sathvik Suryadevara
Saint Louis University School of Medicine, St. Louis, MO
S
Sri Sai Dinesh Jaliparthi
Saint Louis University School of Medicine, St. Louis, MO
N
Noah Michael Prudlo
Saint Louis University School of Medicine, St. Louis, MO
M
Mark David Marino
Saint Louis University School of Medicine, St. Louis, MO
J
Jeremy Palacio
Saint Louis University School of Medicine, St. Louis, MO
R
Rithvik Akula
Saint Louis University School of Medicine, St. Louis, MO
Hong-Yu Zhou
Hong-Yu Zhou
Assistant Professor of Biomedical Engineering, Tsinghua University. Past: Harvard Medical School.
AI for HealthcareAI for MedicineBiomedical AI
Ibrahim Ethem Hamamci
Ibrahim Ethem Hamamci
MD-PhD Student at University of Zurich | ETH AI Center
Medical Image AnalysisMachine Learning
S
Scott J. Adams
Department of Medical Imaging, Royal University Hospital, Saskatoon, SK, Canada
H
Hassan Rayhan AlOmaish
Chest Radiology Division, Medical Imaging Department, King Abdullah Specialized Children’s Hospital, Riyadh, Saudi Arabia; King Abdulaziz Medical City, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
P
Pranav Rajpurkar
Department of Biomedical Informatics, Harvard Medical School, Boston, MA