🤖 AI Summary
To address the limitations of the Segment Anything Model (SAM)—namely its reliance on manually designed point prompts and constrained generalization and scalability—this paper proposes a fine-tuning-free, plug-and-play method for automatic point prompt optimization. The core innovation lies in constructing a dual-space graph—integrating semantic and physical features—to model inter-patch relationships, coupled with an “attack-as-defense” adversarial reinforcement learning framework: an attacker generates perturbed prompts to expose model vulnerabilities, while a defender dynamically optimizes prompts to enhance robustness. A deep Q-network drives policy learning, with a reward function explicitly designed based on segmentation quality improvement. Experiments demonstrate significant gains in SAM’s segmentation accuracy under coarse prompting and its cross-task generalization capability, achieving consistent performance improvements across multiple benchmarks.
📝 Abstract
Prompt quality plays a critical role in the performance of the Segment Anything Model (SAM), yet existing approaches often rely on heuristic or manually crafted prompts, limiting scalability and generalization. In this paper, we propose Point Prompt Defender, an adversarial reinforcement learning framework that adopts an attack-for-defense paradigm to automatically optimize point prompts. We construct a task-agnostic point prompt environment by representing image patches as nodes in a dual-space graph, where edges encode both physical and semantic distances. Within this environment, an attacker agent learns to activate a subset of prompts that maximally degrade SAM's segmentation performance, while a defender agent learns to suppress these disruptive prompts and restore accuracy. Both agents are trained using Deep Q-Networks with a reward signal based on segmentation quality variation. During inference, only the defender is deployed to refine arbitrary coarse prompt sets, enabling enhanced SAM segmentation performance across diverse tasks without retraining. Extensive experiments show that Point Prompt Defender effectively improves SAM's robustness and generalization, establishing a flexible, interpretable, and plug-and-play framework for prompt-based segmentation.