🤖 AI Summary
This study addresses the scarcity of high-quality, openly accessible image datasets in dermatopathology, which hinders both clinical education and machine learning research. To overcome this limitation, the authors propose a hybrid AI workflow that integrates deep learning–based image classification with textual analysis of figure captions to automatically retrieve, filter, and annotate dermatopathological images from PubMed Central. The pipeline incorporates expert review to establish a semi-automated dataset curation process, resulting in the release of DermpathNet—an open-access dataset comprising 7,772 images spanning 166 diagnostic categories. The hybrid filtering approach achieves an F-score of 90.4%, demonstrating high precision and recall. Beyond providing a valuable resource for educational and algorithmic development purposes, this work also highlights the performance limitations of general-purpose AI models in the specialized domain of dermatopathology.
📝 Abstract
Accessing high-quality, open-access dermatopathology image datasets for learning and cross-referencing is a common challenge for clinicians and trainees. To establish a comprehensive open-access dermatopathology dataset for educational, cross-referencing, and machine-learning purposes, we employed a hybrid workflow to curate and categorize images from the PubMed Central (PMC) repository. We used specific keywords to extract relevant images, and classified them using a novel hybrid method that combined deep learning-based image modality classification with figure caption analyses. Validation on 651 manually annotated images demonstrated the robustness of our workflow, with an F-score of 89.6% for the deep learning approach, 61.0% for the keyword-based retrieval method, and 90.4% for the hybrid approach. We retrieved over 7,772 images across 166 diagnoses and released this fully annotated dataset, reviewed by board-certified dermatopathologists. Using our dataset as a challenging task, we found the current image analysis algorithm from OpenAI inadequate for analyzing dermatopathology images. In conclusion, we have developed a large, peer-reviewed, open-access dermatopathology image dataset, DermpathNet, which features a semi-automated curation workflow.