🤖 AI Summary
This work addresses the challenge of optimizing CRF/Potts loss in scribble-supervised image segmentation, where extreme label sparsity hinders effective training. We propose a soft self-annotation framework that reformulates unsupervised CRF inference as a continuous relaxation problem, integrating both convex and non-convex relaxations and leveraging higher-order optimization techniques such as graph cuts to define an association loss between network predictions and soft pseudo-labels. Joint optimization via gradient descent explicitly models class uncertainty, thereby avoiding error propagation inherent in hard pseudo-labeling. Our key contribution is the first unified end-to-end training framework that jointly incorporates CRF relaxation and differentiable soft annotation. When applied to standard segmentation architectures using only scribble annotations, our method surpasses complex task-specific systems—and in certain settings even outperforms fully supervised baselines—demonstrating substantial improvements in both accuracy and generalization for weakly supervised segmentation.
📝 Abstract
We consider weakly supervised segmentation where only a fraction of pixels have ground truth labels (scribbles) and focus on a self-labeling approach optimizing relaxations of the standard unsupervised CRF/Potts loss on unlabeled pixels. While WSSS methods can directly optimize such losses via gradient descent, prior work suggests that higher-order optimization can improve network training by introducing hidden pseudo-labels and powerful CRF sub-problem solvers, e.g. graph cut. However, previously used hard pseudo-labels can not represent class uncertainty or errors, which motivates soft self-labeling. We derive a principled auxiliary loss and systematically evaluate standard and new CRF relaxations (convex and non-convex), neighborhood systems, and terms connecting network predictions with soft pseudo-labels. We also propose a general continuous sub-problem solver. Using only standard architectures, soft self-labeling consistently improves scribble-based training and outperforms significantly more complex specialized WSSS systems. It can outperform full pixel-precise supervision. Our general ideas apply to other weakly-supervised problems/systems.