Sex-based Bias Inherent in the Dice Similarity Coefficient: A Model Independent Analysis for Multiple Anatomical Structures

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies, for the first time, an intrinsic gender bias in the Dice Similarity Coefficient (DSC): due to smaller average organ volumes in females, identical absolute segmentation errors (e.g., 1 mm boundary deviation) yield systematically lower DSC values in female subjects—up to 0.03 for small organs, diminishing toward zero for large organs. This bias is inherent to the DSC metric itself, independent of model architecture or dataset. Method: We propose a model-agnostic, normalized DSC analysis framework. Using manually annotated clinical MRI scans, we introduce controlled, equal-magnitude synthetic segmentation errors and quantify resulting inter-gender DSC disparities. Contribution/Results: Our analysis demonstrates that conventional DSC-based cross-gender performance comparisons lack fairness, particularly for small organs. The findings expose a fundamental limitation in widely adopted evaluation protocols and provide both theoretical grounding and methodological tools for establishing equitable, volume-aware metrics in medical image segmentation assessment.

Technology Category

Application Category

📝 Abstract
Overlap-based metrics such as the Dice Similarity Coefficient (DSC) penalize segmentation errors more heavily in smaller structures. As organ size differs by sex, this implies that a segmentation error of equal magnitude may result in lower DSCs in women due to their smaller average organ volumes compared to men. While previous work has examined sex-based differences in models or datasets, no study has yet investigated the potential bias introduced by the DSC itself. This study quantifies sex-based differences of the DSC and the normalized DSC in an idealized setting independent of specific models. We applied equally-sized synthetic errors to manual MRI annotations from 50 participants to ensure sex-based comparability. Even minimal errors (e.g., a 1 mm boundary shift) produced systematic DSC differences between sexes. For small structures, average DSC differences were around 0.03; for medium-sized structures around 0.01. Only large structures (i.e., lungs and liver) were mostly unaffected, with sex-based DSC differences close to zero. These findings underline that fairness studies using the DSC as an evaluation metric should not expect identical scores between men and women, as the metric itself introduces bias. A segmentation model may perform equally well across sexes in terms of error magnitude, even if observed DSC values suggest otherwise. Importantly, our work raises awareness of a previously underexplored source of sex-based differences in segmentation performance. One that arises not from model behavior, but from the metric itself. Recognizing this factor is essential for more accurate and fair evaluations in medical image analysis.
Problem

Research questions and friction points this paper is trying to address.

Investigates inherent sex bias in Dice Similarity Coefficient metrics
Analyzes how DSC penalizes errors differently by organ size and sex
Quantifies metric-induced bias independent of specific segmentation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used synthetic errors on manual MRI annotations
Analyzed Dice coefficient bias independent of models
Quantified sex-based DSC differences across organ sizes
🔎 Similar Papers
No similar papers found.
H
Hartmut Häntze
Charité - Universitätsmedizin Berlin, 12203 Berlin, Germany
M
Myrthe Buser
Radboudumc, 6525 GA Nijmegen, Netherlands
Alessa Hering
Alessa Hering
Radboud University Medical Center
Deep LearningImage RegistrationTumor Follow-UpLLM
L
Lisa C. Adams
Klinikum rechts der Isar, Technical University of Munich, 81675 Munich, Germany
K
Keno K. Bressem
Klinikum rechts der Isar, Technical University of Munich, 81675 Munich, Germany