What Can We Learn from Inter-Annotator Variability in Skin Lesion Segmentation?

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Inter-annotator variability (IAV) in medical image segmentation arises from ambiguous lesion boundaries—such as spiculated or infiltrative nodules—often associated with malignancy, yet its clinical correlation remains underexplored. Method: This work presents the first systematic investigation of the relationship between IAV and lesion malignancy in dermoscopic imaging. We introduce IMA++, the largest multi-annotator dataset to date, comprising 1,200 cases annotated by five dermatology experts. We propose a deep multitask learning framework that jointly performs lesion segmentation and IAV prediction, modeling annotation consistency as a learnable soft clinical feature. The framework integrates Dice loss with consistency regularization to enhance robustness. Contribution/Results: Experiments demonstrate an average 4.2% improvement in balanced accuracy across multiple benchmarks and achieve a low mean absolute error of 0.108 in IAV prediction, significantly improving reliability and interpretability for skin cancer辅助 diagnosis.

Technology Category

Application Category

📝 Abstract

Medical image segmentation exhibits intra- and inter-annotator variability due to ambiguous object boundaries, annotator preferences, expertise, and tools, among other factors. Lesions with ambiguous boundaries, e.g., spiculated or infiltrative nodules, or irregular borders per the ABCD rule, are particularly prone to disagreement and are often associated with malignancy. In this work, we curate IMA++, the largest multi-annotator skin lesion segmentation dataset, on which we conduct an in-depth study of variability due to annotator, malignancy, tool, and skill factors. We find a statistically significant (p<0.001) association between inter-annotator agreement (IAA), measured using Dice, and the malignancy of skin lesions. We further show that IAA can be accurately predicted directly from dermoscopic images, achieving a mean absolute error of 0.108. Finally, we leverage this association by utilizing IAA as a "soft" clinical feature within a multi-task learning objective, yielding a 4.2% improvement in balanced accuracy averaged across multiple model architectures and across IMA++ and four public dermoscopic datasets. The code is available at https://github.com/sfu-mial/skin-IAV.

Problem

Research questions and friction points this paper is trying to address.

Study inter-annotator variability in skin lesion segmentation

Analyze factors like malignancy and annotator skill impact

Improve lesion classification using variability as clinical feature

Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest multi-annotator skin lesion dataset

Predict inter-annotator agreement from images

Multi-task learning with clinical feature

🔎 Similar Papers

Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM Empowerment