Granular Computing-driven SAM: From Coarse-to-Fine Guidance for Prompt-Free Segmentation

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing prompt-free image segmentation methods (e.g., SAM) suffer from two key limitations: weak locality—lacking autonomous region localization—and poor scalability—insufficient fine-grained modeling at high resolutions. To address these, we propose Grc-SAM, a coarse-to-fine multi-granularity prompt-free segmentation framework. Its core innovations include an adaptive foreground localization mechanism and sparse local Swin-style attention, enabling end-to-end inference from coarse response regions to fine-grained local optimization via high-response feature extraction and latent prompt embedding. Built upon a vision transformer backbone, Grc-SAM eliminates reliance on hand-crafted prompts and supports accurate segmentation of high-resolution inputs. Extensive experiments demonstrate that Grc-SAM significantly outperforms state-of-the-art prompt-free methods across multiple benchmarks, achieving both higher segmentation accuracy and superior resolution scalability.

Technology Category

Application Category

📝 Abstract
Prompt-free image segmentation aims to generate accurate masks without manual guidance. Typical pre-trained models, notably Segmentation Anything Model (SAM), generate prompts directly at a single granularity level. However, this approach has two limitations: (1) Localizability, lacking mechanisms for autonomous region localization; (2) Scalability, limited fine-grained modeling at high resolution. To address these challenges, we introduce Granular Computing-driven SAM (Grc-SAM), a coarse-to-fine framework motivated by Granular Computing (GrC). First, the coarse stage adaptively extracts high-response regions from features to achieve precise foreground localization and reduce reliance on external prompts. Second, the fine stage applies finer patch partitioning with sparse local swin-style attention to enhance detail modeling and enable high-resolution segmentation. Third, refined masks are encoded as latent prompt embeddings for the SAM decoder, replacing handcrafted prompts with an automated reasoning process. By integrating multi-granularity attention, Grc-SAM bridges granular computing with vision transformers. Extensive experimental results demonstrate Grc-SAM outperforms baseline methods in both accuracy and scalability. It offers a unique granular computational perspective for prompt-free segmentation.
Problem

Research questions and friction points this paper is trying to address.

Achieves autonomous region localization without manual prompts
Enables fine-grained modeling for high-resolution image segmentation
Replaces handcrafted prompts with automated granular computing reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse stage adaptively extracts high-response regions
Fine stage applies patch partitioning with attention
Encodes refined masks as latent prompt embeddings
🔎 Similar Papers
No similar papers found.
Q
Qiyang Yu
School of Computer Science and Software Engineering, Southwest Petroleum University
Yu Fang
Yu Fang
Honda Research Institute Japan Co., Ltd.
Human-Robot InteractionEye-head coordinationEye MovementVisual Perception/Cognition
Tianrui Li
Tianrui Li
School of Computing and Artificial Intelligence, Southwest Jiaotong University
Big Data IntelligenceUrban ComputingGranular Computing
X
Xuemei Cao
School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics
Y
Yan Chen
School of Computer Science and Software Engineering, Southwest Petroleum University
J
Jianghao Li
School of Computer Science and Software Engineering, Southwest Petroleum University
F
Fan Min
School of Computer Science and Software Engineering, Southwest Petroleum University
Y
Yi Zhang
College of Computer Science, Sichuan University