🤖 AI Summary
This work addresses key limitations of existing training-free segmentation methods, which rely on spectral graph partitioning assumptions, require a preset number of clusters, suffer from oversmoothed boundaries, exhibit noise sensitivity, and struggle to capture local structure. The authors reformulate training-free segmentation as a stochastic flow equilibrium problem on a diffusion-based affinity graph. By leveraging stable diffusion to extract local neighborhoods, they construct a sparse yet highly expressive affinity graph that integrates global diffusion attention with fine-grained local structure. Label propagation is achieved through a Markov random walk equipped with adaptive pruning. This approach abandons the conventional spectral clustering framework and substantially improves boundary sharpness, regional consistency, and segmentation stability, achieving zero-shot state-of-the-art performance across seven semantic segmentation benchmarks.
📝 Abstract
We argue that existing training-free segmentation methods rely on an implicit and limiting assumption, that segmentation is a spectral graph partitioning problem over diffusion-derived affinities. Such approaches, based on global graph partitioning and eigenvector-based formulations of affinity matrices, suffer from several fundamental drawbacks, they require pre-selecting the number of clusters, induce boundary oversmoothing due to spectral relaxation, and remain highly sensitive to noisy or multi-modal affinity distributions. Moreover, many prior works neglect the importance of local neighborhood structure, which plays a crucial role in stabilizing affinity propagation and preserving fine-grained contours. To address these limitations, we reformulate training-free segmentation as a stochastic flow equilibrium problem over diffusion-induced affinity graphs, where segmentation emerges from a stochastic propagation process that integrates global diffusion attention with local neighborhoods extracted from stable diffusion, yielding a sparse yet expressive affinity structure. Building on this formulation, we introduce a Markov propagation scheme that performs random-walk-based label diffusion with an adaptive pruning strategy that suppresses unreliable transitions while reinforcing confident affinity paths. Experiments across seven widely used semantic segmentation benchmarks demonstrate that our method achieves state-of-the-art zero-shot performance, producing sharper boundaries, more coherent regions, and significantly more stable masks compared to prior spectral-clustering-based approaches.