🤖 AI Summary
Accurate identification of atypical mitotic figures (AMFs) in breast cancer histopathology remains challenging due to their morphological ambiguity and scarcity of high-quality, expert-annotated datasets. Method: We introduce the first publicly available, expert-consensus–annotated AMF dataset comprising whole-slide images from 223 patients and 3,720 annotated mitotic instances. To better reflect clinical generalizability, we propose and validate a patient-level evaluation paradigm—distinct from conventional patch-level assessment—and integrate Monte Carlo cross-validation, class-imbalance mitigation strategies, and deep learning models for benchmarking. Contribution/Results: Our experiments achieve a mean balanced accuracy of 0.806 at the patch level and 0.713 at the patient level. These results empirically substantiate AMFs as an independent prognostic biomarker and demonstrate the dataset’s high annotation fidelity and suitability for clinically relevant modeling.
📝 Abstract
Assessment of the density of mitotic figures (MFs) in histologic tumor sections is an important prognostic marker for many tumor types, including breast cancer. Recently, it has been reported in multiple works that the quantity of MFs with an atypical morphology (atypical MFs, AMFs) might be an independent prognostic criterion for breast cancer. AMFs are an indicator of mutations in the genes regulating the cell cycle and can lead to aberrant chromosome constitution (aneuploidy) of the tumor cells. To facilitate further research on this topic using pattern recognition, we present the first ever publicly available dataset of atypical and normal MFs (AMi-Br). For this, we utilized two of the most popular MF datasets (MIDOG 2021 and TUPAC) and subclassified all MFs using a three expert majority vote. Our final dataset consists of 3,720 MFs, split into 832 AMFs (22.4%) and 2,888 normal MFs (77.6%) across all 223 tumor cases in the combined set. We provide baseline classification experiments to investigate the consistency of the dataset, using a Monte Carlo cross-validation and different strategies to combat class imbalance. We found an averaged balanced accuracy of up to 0.806 when using a patch-level data set split, and up to 0.713 when using a patient-level split.