🤖 AI Summary
This work addresses the performance degradation in multi-task learning caused by partial labeling, heterogeneous output structures across tasks, and scarce annotations. It introduces, for the first time, invariance- and equivariance-based semi-supervised learning—specifically FixMatch and Dense FixMatch—into the partially labeled multi-task setting, jointly optimizing object detection and semantic segmentation. Experiments on Cityscapes and BDD100K demonstrate that the proposed approach consistently outperforms fully supervised baselines under most settings, with particularly notable gains when labeled data for individual tasks are extremely limited. Moreover, the equivariant variant exhibits overall superior performance, underscoring its general potential to enhance multi-task model efficacy under label-scarce conditions.
📝 Abstract
We investigate the potential of invariant and equivariant semi-supervised learning for addressing the challenges of training multi-task models on partially labeled datasets with differently structured output tasks. Specifically, we use the popular FixMatch method for invariant semi-supervised learning and its equivariant extension Dense FixMatch. We evaluate their performance on the Cityscapes and BDD100K datasets in the context of the prevalent object detection and semantic segmentation tasks in computer vision. We consider varying sizes of the subsets annotated for each task and different overlaps among them. Our results for both invariant and equivariant semi-supervised learning outperform supervised baselines in most situations, with the most significant improvements observed when fewer labeled samples are available for a task and generally better results for the latter approach. Our study suggests that invariant/equivariant learning is a promising general direction for multi-task learning from limited labeled data.