Hemorica: A Comprehensive CT Scan Dataset for Automated Brain Hemorrhage Classification, Segmentation, and Detection

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing publicly available brain CT datasets are fragmented and coarsely annotated, hindering the development of AI models for intracranial hemorrhage (ICH) diagnosis. To address this, we introduce Hemorica—the first open-source benchmark dataset designed for fine-grained ICH analysis. It comprises 372 non-contrast head CT scans covering all five ICH subtypes, with multi-level annotations including patient-level and slice-level labels, 2D/3D segmentation masks, and bounding boxes. Annotations were performed by two radiologists with arbitration by a neurosurgeon, ensuring high inter-rater consistency and clinical validity. Hemorica is the first dataset to support both multi-task learning and curriculum learning paradigms for ICH analysis. Benchmark experiments demonstrate strong performance: MobileViT-XS achieves 87.8% F1-score on binary classification, while U-Net integrated with DenseNet161 attains an 85.5% Dice score on semantic segmentation—validating the dataset’s quality, diversity, and utility for advancing ICH-focused medical AI research.

Technology Category

Application Category

📝 Abstract
Timely diagnosis of Intracranial hemorrhage (ICH) on Computed Tomography (CT) scans remains a clinical priority, yet the development of robust Artificial Intelligence (AI) solutions is still hindered by fragmented public data. To close this gap, we introduce Hemorica, a publicly available collection of 372 head CT examinations acquired between 2012 and 2024. Each scan has been exhaustively annotated for five ICH subtypes-epidural (EPH), subdural (SDH), subarachnoid (SAH), intraparenchymal (IPH), and intraventricular (IVH)-yielding patient-wise and slice-wise classification labels, subtype-specific bounding boxes, two-dimensional pixel masks and three-dimensional voxel masks. A double-reading workflow, preceded by a pilot consensus phase and supported by neurosurgeon adjudication, maintained low inter-rater variability. Comprehensive statistical analysis confirms the clinical realism of the dataset. To establish reference baselines, standard convolutional and transformer architectures were fine-tuned for binary slice classification and hemorrhage segmentation. With only minimal fine-tuning, lightweight models such as MobileViT-XS achieved an F1 score of 87.8% in binary classification, whereas a U-Net with a DenseNet161 encoder reached a Dice score of 85.5% for binary lesion segmentation that validate both the quality of the annotations and the sufficiency of the sample size. Hemorica therefore offers a unified, fine-grained benchmark that supports multi-task and curriculum learning, facilitates transfer to larger but weakly labelled cohorts, and facilitates the process of designing an AI-based assistant for ICH detection and quantification systems.
Problem

Research questions and friction points this paper is trying to address.

Addressing fragmented public data for brain hemorrhage AI solutions
Providing comprehensive annotations for five intracranial hemorrhage subtypes
Establishing benchmark datasets for hemorrhage classification and segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Public CT dataset with multi-type hemorrhage annotations
Double-reading workflow with neurosurgeon adjudication
Fine-tuned lightweight models for classification and segmentation
🔎 Similar Papers
No similar papers found.