Stratify or Die: Rethinking Data Splits in Image Segmentation

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Random dataset splitting in image segmentation often induces label distribution shifts in the test set, compromising evaluation reliability and model generalization. To address stratification challenges arising from multi-label annotations, strong spatial correlations, and severe class imbalance, we propose two novel stratified splitting methods: Iterative Pixel Stratification (IPS) and Wasserstein-Driven Evolutionary Stratification (WDES). WDES is the first approach to embed the Wasserstein distance into a genetic algorithm framework to globally optimize pixel-level distribution alignment between training and test sets; it further introduces a statistical heterogeneity metric to quantify stratification quality. Extensive validation across diverse domains—including street-scene, medical, and satellite imagery—demonstrates that WDES significantly reduces model performance variance (average reduction of 32.7%) and enhances evaluation robustness under few-shot and long-tailed settings. Our work establishes a verifiable, generalizable theoretical and practical paradigm for data partitioning in semantic segmentation.

Technology Category

Application Category

📝 Abstract
Random splitting of datasets in image segmentation often leads to unrepresentative test sets, resulting in biased evaluations and poor model generalization. While stratified sampling has proven effective for addressing label distribution imbalance in classification tasks, extending these ideas to segmentation remains challenging due to the multi-label structure and class imbalance typically present in such data. Building on existing stratification concepts, we introduce Iterative Pixel Stratification (IPS), a straightforward, label-aware sampling method tailored for segmentation tasks. Additionally, we present Wasserstein-Driven Evolutionary Stratification (WDES), a novel genetic algorithm designed to minimize the Wasserstein distance, thereby optimizing the similarity of label distributions across dataset splits. We prove that WDES is globally optimal given enough generations. Using newly proposed statistical heterogeneity metrics, we evaluate both methods against random sampling and find that WDES consistently produces more representative splits. Applying WDES across diverse segmentation tasks, including street scenes, medical imaging, and satellite imagery, leads to lower performance variance and improved model evaluation. Our results also highlight the particular value of WDES in handling small, imbalanced, and low-diversity datasets, where conventional splitting strategies are most prone to bias.
Problem

Research questions and friction points this paper is trying to address.

Random dataset splits cause biased evaluations in image segmentation
Extending stratified sampling to segmentation is challenging due to multi-label imbalance
Conventional splitting strategies fail with small imbalanced datasets causing poor generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative Pixel Stratification for segmentation tasks
Wasserstein-Driven Evolutionary Stratification genetic algorithm
Minimizing Wasserstein distance for label distribution similarity
🔎 Similar Papers
No similar papers found.
N
Naga Venkata Sai Jitin Jami
Machine Learning and Data Analytics Lab, FAU Erlangen-Nürnberg, Germany
T
Thomas Altstidl
Machine Learning and Data Analytics Lab, FAU Erlangen-Nürnberg, Germany
Jonas Mueller
Jonas Mueller
Cleanlab
Trustworthy AIMachine LearningStatisticsComputational Biology
J
Jindong Li
Machine Learning and Data Analytics Lab, FAU Erlangen-Nürnberg, Germany
Dario Zanca
Dario Zanca
Head of Applied Machine Learning Group @ MaD Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg
Deep LearningHuman-inspired AIRobustnessAI PsychophysicsComputer Vision
B
Bjoern Eskofier
Machine Learning and Data Analytics Lab, FAU Erlangen-Nürnberg, Germany
Heike Leutheuser
Heike Leutheuser
Professor for AAL & Medical Assistance Systems, University of Bayreuth
Wearable ComputingBiosignal ProcessingTime series analysisMachine Learning