🤖 AI Summary
Panoptic segmentation requires the joint modeling of countable instances and uncountable regions, posing significant challenges in long-range dependency modeling, multi-scale feature fusion, and efficient dense prediction. This work presents the first application of Vision Mamba to panoptic segmentation by introducing an all-Mamba architecture: a Mamba-based backbone for feature extraction, a MambaFPN module enabling global multi-scale fusion with linear computational complexity, and a QuadMamba component for multi-stage feature refinement. Integrated with a PanopticFCN-style kernel generator, the framework achieves proposal-free unified prediction. The proposed method outperforms PanopticDeepLab and PanopticFCN on both Cityscapes and COCO benchmarks, and matches or exceeds Mask2Former’s panoptic quality (PQ) and average precision (AP) on Cityscapes while using fewer parameters.
📝 Abstract
Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.