🤖 AI Summary
Addressing two key challenges in X-ray security image instance segmentation—significant appearance disparity between prohibited items and natural objects, and severe occlusion causing mask ambiguity—this paper proposes an occlusion-aware dual-layer mask decoder that explicitly models inter-object occlusion relationships. We introduce PIDray-A and PIXray-A, the first large-scale X-ray instance segmentation datasets featuring fine-grained occlusion annotations. To enable zero-shot transfer, we integrate the Segment Anything Model (SAM) and further refine the decoding process using human-annotated occlusion supervision signals. Evaluated on PIDray-A and PIXray-A, our method achieves an 8.2% improvement in mAP, demonstrating substantially enhanced robustness under heavy occlusion. All code and datasets are publicly released.
📝 Abstract
Instance segmentation of prohibited items in security X-ray images is a critical yet challenging task. This is mainly caused by the significant appearance gap between prohibited items in X-ray images and natural objects, as well as the severe overlapping among objects in X-ray images. To address these issues, we propose an occlusion-aware instance segmentation pipeline designed to identify prohibited items in X-ray images. Specifically, to bridge the representation gap, we integrate the Segment Anything Model (SAM) into our pipeline, taking advantage of its rich priors and zero-shot generalization capabilities. To address the overlap between prohibited items, we design an occlusion-aware bilayer mask decoder module that explicitly models the occlusion relationships. To supervise occlusion estimation, we manually annotated occlusion areas of prohibited items in two large-scale X-ray image segmentation datasets, PIDray and PIXray. We then reorganized these additional annotations together with the original information as two occlusion-annotated datasets, PIDray-A and PIXray-A. Extensive experimental results on these occlusion-annotated datasets demonstrate the effectiveness of our proposed method. The datasets and codes are available at: https://github.com/Ryh1218/Occ