🤖 AI Summary
Existing prompt-free image segmentation methods (e.g., SAM) suffer from two key limitations: weak locality—lacking autonomous region localization—and poor scalability—insufficient fine-grained modeling at high resolutions. To address these, we propose Grc-SAM, a coarse-to-fine multi-granularity prompt-free segmentation framework. Its core innovations include an adaptive foreground localization mechanism and sparse local Swin-style attention, enabling end-to-end inference from coarse response regions to fine-grained local optimization via high-response feature extraction and latent prompt embedding. Built upon a vision transformer backbone, Grc-SAM eliminates reliance on hand-crafted prompts and supports accurate segmentation of high-resolution inputs. Extensive experiments demonstrate that Grc-SAM significantly outperforms state-of-the-art prompt-free methods across multiple benchmarks, achieving both higher segmentation accuracy and superior resolution scalability.
📝 Abstract
Prompt-free image segmentation aims to generate accurate masks without manual guidance. Typical pre-trained models, notably Segmentation Anything Model (SAM), generate prompts directly at a single granularity level. However, this approach has two limitations: (1) Localizability, lacking mechanisms for autonomous region localization; (2) Scalability, limited fine-grained modeling at high resolution. To address these challenges, we introduce Granular Computing-driven SAM (Grc-SAM), a coarse-to-fine framework motivated by Granular Computing (GrC). First, the coarse stage adaptively extracts high-response regions from features to achieve precise foreground localization and reduce reliance on external prompts. Second, the fine stage applies finer patch partitioning with sparse local swin-style attention to enhance detail modeling and enable high-resolution segmentation. Third, refined masks are encoded as latent prompt embeddings for the SAM decoder, replacing handcrafted prompts with an automated reasoning process. By integrating multi-granularity attention, Grc-SAM bridges granular computing with vision transformers. Extensive experimental results demonstrate Grc-SAM outperforms baseline methods in both accuracy and scalability. It offers a unique granular computational perspective for prompt-free segmentation.