Is Synthetic Image Augmentation Useful for Imbalanced Classification Problems? Case-Study on the MIDOG2025 Atypical Cell Detection Competition

📅 2025-08-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the detection of rare, clinically critical asymmetric mitotic cells in histopathological images—challenged by severe class imbalance and cross-center domain shift. We systematically evaluate synthetic data augmentation and pretraining strategies, employing ConvNeXt-Small and Lunit’s self-supervised ViT as backbones, with five-fold cross-validation and joint training on real and synthetic samples. Results show that domain-specific pretraining enhances robustness, while ImageNet pretraining achieves higher performance ceilings; naive synthetic augmentation yields limited and inconsistent gains. The two models achieve a mean AUROC of 95%, with ConvNeXt attaining the highest AUROC (95.4%) on the held-out test set and Lunit demonstrating superior balanced accuracy. Our study uncovers the trade-off between pretraining source and data augmentation efficacy for rare pathological object recognition, offering a reproducible methodological framework for few-shot classification in medical imaging.

Technology Category

Application Category

📝 Abstract
The MIDOG 2025 challenge extends prior work on mitotic figure detection by introducing a new Track 2 on atypical mitosis classification. This task aims to distinguish normal from atypical mitotic figures in histopathology images, a clinically relevant but highly imbalanced and cross-domain problem. We investigated two complementary backbones: (i) ConvNeXt-Small, pretrained on ImageNet, and (ii) a histopathology-specific ViT from Lunit trained via self-supervision. To address the strong prevalence imbalance (9408 normal vs. 1741 atypical), we synthesized additional atypical examples to approximate class balance and compared models trained with real-only vs. real+synthetic data. Using five-fold cross-validation, both backbones reached strong performance (mean AUROC approximately 95 percent), with ConvNeXt achieving slightly higher peaks while Lunit exhibited greater fold-to-fold stability. Synthetic balancing, however, did not lead to consistent improvements. On the organizers' preliminary hidden test set, explicitly designed as an out-of-distribution debug subset, ConvNeXt attained the highest AUROC (95.4 percent), whereas Lunit remained competitive on balanced accuracy. These findings suggest that both ImageNet and domain-pretrained backbones are viable for atypical mitosis classification, with domain-pretraining conferring robustness and ImageNet pretraining reaching higher peaks, while naive synthetic balancing has limited benefit. Full hidden test set results will be reported upon challenge completion.
Problem

Research questions and friction points this paper is trying to address.

Distinguishing normal versus atypical mitotic figures in histopathology images
Addressing highly imbalanced classification with cross-domain challenges
Evaluating synthetic image augmentation for class imbalance mitigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used ConvNeXt and domain-specific ViT backbones
Synthesized atypical examples for class balance
Compared real-only versus real plus synthetic data
🔎 Similar Papers
No similar papers found.
L
Leire Benito-Del-Valle
TECNALIA, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, C/ Geldo. Edificio 700, E-48160 Derio - Bizkaia (Spain)
P
Pedro A. Moreno-Sánchez
TECNALIA, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, C/ Geldo. Edificio 700, E-48160 Derio - Bizkaia (Spain)
I
Itziar Egusquiza
TECNALIA, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, C/ Geldo. Edificio 700, E-48160 Derio - Bizkaia (Spain)
I
Itsaso Vitoria
TECNALIA, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, C/ Geldo. Edificio 700, E-48160 Derio - Bizkaia (Spain)
A
Artzai Picón
TECNALIA, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, C/ Geldo. Edificio 700, E-48160 Derio - Bizkaia (Spain)
C
Cristina López-Saratxaga
TECNALIA, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Bizkaia, C/ Geldo. Edificio 700, E-48160 Derio - Bizkaia (Spain)
Adrian Galdran
Adrian Galdran
Ramon y Cajal / Ikerbasque Research Fellow @ Tecnalia
Medical Computer VisionDeep Learning for Biomedical Image Analysis