🤖 AI Summary
This work proposes Selfment, a novel framework that achieves high-precision object segmentation in a fully unsupervised manner—requiring neither human annotations, pretrained models, nor post-processing. The method constructs a block-level affinity graph via self-supervised learning and leverages normalized cuts (NCut) together with an Iterative Patch Optimization (IPO) mechanism to generate high-quality initial masks. A lightweight segmentation head is then trained end-to-end to produce final predictions. IPO effectively enhances both spatial and semantic consistency, further refined by contrastive learning and region consistency losses. Experimental results demonstrate significant performance gains, with F_max improvements of 4.0%, 4.6%, and 5.7% on ECSSD, HKUIS, and PASCAL-S benchmarks, respectively. Notably, Selfment achieves state-of-the-art zero-shot performance on camouflaged object detection tasks.
📝 Abstract
Accurately segmenting objects without any manual annotations remains one of the core challenges in computer vision. In this work, we introduce Selfment, a fully self-supervised framework that segments foreground objects directly from raw images without human labels, pretrained segmentation models, or any post-processing. Selfment first constructs patch-level affinity graphs from self-supervised features and applies NCut to obtain an initial coarse foreground--background separation. We then introduce Iterative Patch Optimization (IPO), a feature-space refinement procedure that progressively enforces spatial coherence and semantic consistency through iterative patch clustering. The refined masks are subsequently used as supervisory signals to train a lightweight segmentation head with contrastive and region-consistency objectives, allowing the model to learn stable and transferable object representations. Despite its simplicity and complete absence of manual supervision, Selfment sets new state-of-the-art (SoTA) results across multiple benchmarks. It achieves substantial improvements on $F_{\max}$ over previous unsupervised saliency detection methods on ECSSD ($+4.0\%$), HKUIS ($+4.6\%$), and PASCAL-S ($+5.7\%$). Moreover, without any additional fine-tuning, Selfment demonstrates remarkable zero-shot generalization to camouflaged object detection tasks (e.g., $0.910$ $S_m$ on CHAMELEON and $0.792$ $F_{\beta}^{\omega}$ on CAMO), outperforming all existing unsupervised approaches and even rivaling the SoTA fully supervised methods.