🤖 AI Summary
This work proposes a general-purpose image segmentation method that operates without relying on mask-based mechanisms, leveraging a diffusion model to enable end-to-end holistic prediction. The key innovations include the introduction of a position-aware palette and 2D Gray code ordering to support principled uncertainty modeling, alongside the use of tanh activation and a sigmoid-based loss weighting strategy within a discrete output space. Although the approach does not surpass current state-of-the-art mask-based methods in performance, it significantly narrows the gap while demonstrating a distinctive capacity for uncertainty-aware segmentation. This capability opens a promising new direction for integrating large-scale pretraining into segmentation frameworks.
📝 Abstract
This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting the full segmentation in a holistic manner. We present several key adaptations to diffusion models, which are important in this discrete setting. Notably, we show that a location-aware palette with our 2D gray code ordering improves performance. Adding a final tanh activation function is crucial for discrete data. On optimizing diffusion parameters, the sigmoid loss weighting consistently outperforms alternatives, regardless of the prediction type used, and we settle on x-prediction. While our current model does not yet surpass leading mask-based architectures, it narrows the performance gap and introduces unique capabilities, such as principled ambiguity modeling, that these models lack. All models were trained from scratch, and we believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.