Split-Fuse-Transport: Annotation-Free Saliency via Dual Clustering and Optimal Transport Alignment

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This paper addresses the unreliability of pseudo-masks in unsupervised salient object detection (SOD), stemming from the absence of pixel-level annotations. To this end, we propose AutoSOD—a fully end-to-end framework built upon a novel Split-Fuse-Transport architecture. Its core innovations include: (i) an entropy-guided dual-clustering head integrating spectral clustering and k-means; (ii) prototype optimal transport (POT) to align prototypes across multi-view feature representations; and (iii) a MaskFormer-style encoder-decoder that generates high-quality, part-aware, boundary-sharp pseudo-masks in a single forward pass—eliminating hand-crafted priors and offline voting heuristics. Evaluated on five standard benchmarks, AutoSOD achieves up to 26% and 36% absolute gains in F-measure over state-of-the-art unsupervised and weakly supervised methods, respectively, closely approaching fully supervised performance while improving both training efficiency and segmentation accuracy.

Technology Category

Application Category

📝 Abstract

Salient object detection (SOD) aims to segment visually prominent regions in images and serves as a foundational task for various computer vision applications. We posit that SOD can now reach near-supervised accuracy without a single pixel-level label, but only when reliable pseudo-masks are available. We revisit the prototype-based line of work and make two key observations. First, boundary pixels and interior pixels obey markedly different geometry; second, the global consistency enforced by optimal transport (OT) is underutilized if prototype quality is weak. To address this, we introduce POTNet, an adaptation of Prototypical Optimal Transport that replaces POT's single k-means step with an entropy-guided dual-clustering head: high-entropy pixels are organized by spectral clustering, low-entropy pixels by k-means, and the two prototype sets are subsequently aligned by OT. This split-fuse-transport design yields sharper, part-aware pseudo-masks in a single forward pass, without handcrafted priors. Those masks supervise a standard MaskFormer-style encoder-decoder, giving rise to AutoSOD, an end-to-end unsupervised SOD pipeline that eliminates SelfMask's offline voting yet improves both accuracy and training efficiency. Extensive experiments on five benchmarks show that AutoSOD outperforms unsupervised methods by up to 26% and weakly supervised methods by up to 36% in F-measure, further narrowing the gap to fully supervised models.

Problem

Research questions and friction points this paper is trying to address.

Achieving unsupervised salient object detection without pixel-level annotations

Improving prototype quality through dual clustering and optimal transport alignment

Generating sharper pseudo-masks to supervise end-to-end SOD pipeline

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual clustering with spectral and k-means methods

Optimal transport alignment for global consistency

Annotation-free saliency via split-fuse-transport design

🔎 Similar Papers

No similar papers found.