Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the vulnerability of semantic segmentation models to spurious correlations under distribution shift, which often leads to semantic label flipping—where correctly segmented regions are assigned incorrect class labels. The study provides the first systematic characterization of this issue by decomposing foreground errors into three categories: correct predictions, label flips, and missed detections. It introduces flip-risk, a novel inference-time diagnostic metric that quantifies label-flip susceptibility based on foreground identity uncertainty, without requiring ground-truth annotations. Experiments demonstrate that stronger correlations between object categories and scene contexts during training lead to more pronounced label-flipping errors on counterfactual test samples. The proposed flip-risk effectively identifies high-risk instances, offering a practical tool for evaluating model robustness in the presence of distributional shifts.

Technology Category

Application Category

📝 Abstract
The robustness of machine learning models can be compromised by spurious correlations between non-causal features in the input data and target labels. A common way to test for such correlations is to train on data where the label is strongly tied to some non-causal cue, then evaluate on examples where that tie no longer holds. This idea is well established for classification tasks, but for semantic segmentation the specific failure modes are not well understood. We show that a model may achieve reasonable overlap while assigning the wrong semantic label, swapping one plausible foreground class for another, even when object boundaries are largely correct. We focus on this semantic label-flip behaviour and quantify it with a simple diagnostic (Flip) that counts how often ground truth foreground pixels are assigned the wrong foreground identity while remaining predicted as foreground. In a setting where category and scene are correlated during training, increasing the correlation consistently widens the gap between common and rare test conditions and increases these within-object label swaps on counterfactual groups. Overall, our results motivate assessing segmentation robustness under distribution shift beyond overlap by decomposing foreground errors into correct pixels, flipped-identity pixels, and missed-to-background pixels. We also propose an entropy-based, ground truth label-free `flip-risk' score, which is computed from foreground identity uncertainty, and show that it can flag flip-prone cases at inference time. Code is available at https://github.com/acharaakshit/label-flips.
Problem

Research questions and friction points this paper is trying to address.

semantic segmentation
label flip
correlation shift
distribution shift
spurious correlation
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic label flip
correlation shift
segmentation robustness
flip-risk
distribution shift
A
Akshit Achara
School of Biomedical Engineering & Imaging Sciences, King’s College London, UK
Y
Yovin Yathathugoda
School of Biomedical Engineering & Imaging Sciences, King’s College London, UK
N
Nick Byrne
School of Biomedical Engineering & Imaging Sciences, King’s College London, UK
M
Michela Antonelli
School of Biomedical Engineering & Imaging Sciences, King’s College London, UK
E
Esther Puyol Anton
School of Biomedical Engineering & Imaging Sciences, King’s College London, UK
Alexander Hammers
Alexander Hammers
School of Biomedical Engineering and Imaging Sciences, King's College London
EpilepsyPET(-MRI)atlaseslarge axial field-of-view ("Total Body") PET
A
Andrew P. King
School of Biomedical Engineering & Imaging Sciences, King’s College London, UK