BREPS: Bounding-Box Robustness Evaluation of Promptable Segmentation

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient robustness of existing promptable segmentation models when deployed with real-world user-provided bounding box prompts, to which they exhibit high sensitivity even under minor perturbations. To systematically evaluate such vulnerability, we introduce BREPS—a novel methodology that formulates robustness testing as a white-box optimization problem balancing naturalness constraints and performance boundaries, enabling efficient evaluation through adversarial bounding box generation. Leveraging real user annotations collected via user studies, we construct a comprehensive benchmark spanning ten diverse datasets—from everyday scenes to medical imaging—and demonstrate the effectiveness of BREPS in revealing the fragility of current models under realistic prompting conditions.

Technology Category

Application Category

📝 Abstract
Promptable segmentation models such as SAM have established a powerful paradigm, enabling strong generalization to unseen objects and domains with minimal user input, including points, bounding boxes, and text prompts. Among these, bounding boxes stand out as particularly effective, often outperforming points while significantly reducing annotation costs. However, current training and evaluation protocols typically rely on synthetic prompts generated through simple heuristics, offering limited insight into real-world robustness. In this paper, we investigate the robustness of promptable segmentation models to natural variations in bounding box prompts. First, we conduct a controlled user study and collect thousands of real bounding box annotations. Our analysis reveals substantial variability in segmentation quality across users for the same model and instance, indicating that SAM-like models are highly sensitive to natural prompt noise. Then, since exhaustive testing of all possible user inputs is computationally prohibitive, we reformulate robustness evaluation as a white-box optimization problem over the bounding box prompt space. We introduce BREPS, a method for generating adversarial bounding boxes that minimize or maximize segmentation error while adhering to naturalness constraints. Finally, we benchmark state-of-the-art models across 10 datasets, spanning everyday scenes to medical imaging. Code - https://github.com/emb-ai/BREPS.
Problem

Research questions and friction points this paper is trying to address.

promptable segmentation
bounding box robustness
natural prompt variation
segmentation evaluation
user annotation variability
Innovation

Methods, ideas, or system contributions that make the work stand out.

promptable segmentation
bounding box robustness
adversarial prompt generation
white-box optimization
user study
🔎 Similar Papers
No similar papers found.