Segment Anything, Even Occluded

📅 2025-03-08

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses amodal instance segmentation—simultaneously detecting and segmenting both visible and occluded parts of objects—to improve perception robustness under occlusion. Existing methods suffer from inflexible joint training of detectors and mask decoders, hindering reuse of pre-trained detectors. To overcome this, we propose SAMEO: the first framework that adapts Segment Anything Model (SAM) as a general-purpose, plug-and-play mask decoder compatible with arbitrary front-end detectors. We further introduce Amodal-LVIS, the first large-scale synthetic dataset (300K images) for amodal segmentation, alleviating the scarcity of real-world occlusion annotations. Additionally, we design a zero-shot cross-domain transfer strategy leveraging synthetic data. On COCOA-cls and D2SA benchmarks, SAMEO achieves state-of-the-art zero-shot performance, demonstrating significantly improved generalization to unseen occlusion patterns without fine-tuning.

Technology Category

Application Category

📝 Abstract

Amodal instance segmentation, which aims to detect and segment both visible and invisible parts of objects in images, plays a crucial role in various applications including autonomous driving, robotic manipulation, and scene understanding. While existing methods require training both front-end detectors and mask decoders jointly, this approach lacks flexibility and fails to leverage the strengths of pre-existing modal detectors. To address this limitation, we propose SAMEO, a novel framework that adapts the Segment Anything Model (SAM) as a versatile mask decoder capable of interfacing with various front-end detectors to enable mask prediction even for partially occluded objects. Acknowledging the constraints of limited amodal segmentation datasets, we introduce Amodal-LVIS, a large-scale synthetic dataset comprising 300K images derived from the modal LVIS and LVVIS datasets. This dataset significantly expands the training data available for amodal segmentation research. Our experimental results demonstrate that our approach, when trained on the newly extended dataset, including Amodal-LVIS, achieves remarkable zero-shot performance on both COCOA-cls and D2SA benchmarks, highlighting its potential for generalization to unseen scenarios.

Problem

Research questions and friction points this paper is trying to address.

Amodal instance segmentation for visible and invisible object parts.

Flexible framework integrating SAM with various front-end detectors.

Creation of Amodal-LVIS dataset to enhance amodal segmentation research.

Innovation

Methods, ideas, or system contributions that make the work stand out.

SAMEO framework adapts SAM for versatile mask decoding

Amodal-LVIS dataset expands amodal segmentation training data

Achieves zero-shot performance on COCOA-cls and D2SA benchmarks

🔎 Similar Papers

No similar papers found.