đ€ AI Summary
A publicly available, multicenter, multimodal head and neck cancer (HNC) dataset is lacking, hindering AI development for tumor segmentation, recurrence-free survival (RFS) prediction, and HPV status classification. Method: We introduce HNC-1123âthe first open-source, multicenter PET/CT dataset comprising 1,123 patients from 10 international centers, with expert-annotated tumor segmentations, radiotherapy dose maps, and longitudinal clinical follow-up metadata. It represents the first standardized integration and high-fidelity annotation of heterogeneous multicenter PET/CT data. All imaging data are anonymized in NIfTI format. Using UNet, SegResNet, and multimodal prognostic models, we perform end-to-end analysis. Results: The dataset achieves robust benchmark performance: Dice >0.82 for automatic segmentation, C-index = 0.74 for RFS prediction, and AUC = 0.89 for HPV classificationâsignificantly advancing AI-driven precision radiotherapy and prognostic modeling in HNC.
đ Abstract
We describe a publicly available multimodal dataset of annotated Positron Emission Tomography/Computed Tomography (PET/CT) studies for head and neck cancer research. The dataset includes 1123 FDG-PET/CT studies from patients with histologically confirmed head and neck cancer, acquired from 10 international medical centers. All examinations consisted of co-registered PET/CT scans with varying acquisition protocols, reflecting real-world clinical diversity across institutions. Primary gross tumor volumes (GTVp) and involved lymph nodes (GTVn) were manually segmented by experienced radiation oncologists and radiologists following standardized guidelines and quality control measures. We provide anonymized NifTi files of all studies, along with expert-annotated segmentation masks, radiotherapy dose distribution for a subset of patients, and comprehensive clinical metadata. This metadata includes TNM staging, HPV status, demographics (age and gender), long-term follow-up outcomes, survival times, censoring indicators, and treatment information. We demonstrate how this dataset can be used for three key clinical tasks: automated tumor segmentation, recurrence-free survival prediction, and HPV status classification, providing benchmark results using state-of-the-art deep learning models, including UNet, SegResNet, and multimodal prognostic frameworks.