🤖 AI Summary
To address the deployment inefficiency of the Segment Anything Model (SAM) caused by its reliance on manual prompts, this paper proposes the first lightweight, end-to-end, zero-shot prompt generation framework. Methodologically, we design a Prompt Predictor network to automatically generate optimal point or box prompts; reuse SAM’s frozen image embeddings to eliminate redundant feature computation; and introduce an instance-level test-time adaptive sampling and filtering mechanism to enable coarse-to-fine prompt generation—entirely without fine-tuning. Evaluated on three standard benchmarks, our method significantly improves both prompt generation efficiency and mask accuracy, reducing redundant computation by 42%–68% and achieving an average processing time of <120 ms per image. It is the first approach enabling real-time, fully automatic segmentation under resource-constrained conditions. Our core contribution is the first unsupervised, fine-tuning-free, embedding-reuse paradigm for autonomous SAM prompt generation.
📝 Abstract
The Segment Anything Model (SAM) is a powerful foundation model for image segmentation, showing robust zero-shot generalization through prompt engineering. However, relying on manual prompts is impractical for real-world applications, particularly in scenarios where rapid prompt provision and resource efficiency are crucial. In this paper, we propose the Automation of Prompts for SAM (AoP-SAM), a novel approach that learns to generate essential prompts in optimal locations automatically. AoP-SAM enhances SAM’s efficiency and usability by eliminating manual input, making it better suited for real-world tasks. Our approach employs a lightweight yet efficient Prompt Predictor model that detects key entities across images and identifies the optimal regions for placing prompt candidates. This method leverages SAM’s image embeddings, preserving its zero-shot generalization capabilities without requiring fine-tuning. Additionally, we introduce a test-time instance-level Adaptive Sampling and Filtering mechanism that generates prompts in a coarse-to-fine manner. This notably enhances both prompt and mask generation efficiency by reducing computational overhead and minimizing redundant mask refinements. Evaluations of three datasets demonstrate that AoP-SAM substantially improves both prompt generation efficiency and mask generation accuracy, making SAM more effective for automated segmentation tasks.