Towards Instance Segmentation with Polygon Detection Transformers

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inherent tension in instance segmentation between high-resolution inputs and the demand for lightweight, real-time inference by reframing the task as sparse vertex regression in polar coordinates, eschewing conventional dense pixel-wise mask prediction. To this end, the authors propose a polygon detection Transformer architecture equipped with a polar-coordinate deformable attention mechanism and a position-aware training strategy. This study presents the first systematic comparison of polar-coordinate representation against traditional mask-based approaches for instance segmentation. Experimental results demonstrate that the proposed method achieves a 4.7 mAP improvement on MS COCO, reduces memory consumption by nearly 50% on Cityscapes, and consistently outperforms mask-based baselines across PanNuke and SpaceNet benchmarks.

Technology Category

Application Category

📝 Abstract
One of the bottlenecks for instance segmentation today lies in the conflicting requirements of high-resolution inputs and lightweight, real-time inference. To address this bottleneck, we present a Polygon Detection Transformer (Poly-DETR) to reformulate instance segmentation as sparse vertex regression via Polar Representation, thereby eliminating the reliance on dense pixel-wise mask prediction. Considering the box-to-polygon reference shift in Detection Transformers, we propose Polar Deformable Attention and Position-Aware Training Scheme to dynamically update supervision and focus attention on boundary cues. Compared with state-of-the-art polar-based methods, Poly-DETR achieves a 4.7 mAP improvement on MS COCO test-dev. Moreover, we construct a parallel mask-based counterpart to support a systematic comparison between polar and mask representations. Experimental results show that Poly-DETR is more lightweight in high-resolution scenarios, reducing memory consumption by almost half on Cityscapes dataset. Notably, on PanNuke (cell segmentation) and SpaceNet (building footprints) datasets, Poly-DETR surpasses its mask-based counterpart on all metrics, which validates its advantage on regular-shaped instances in domain-specific settings.
Problem

Research questions and friction points this paper is trying to address.

instance segmentation
high-resolution inputs
real-time inference
dense mask prediction
lightweight models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Polygon Detection Transformer
Polar Representation
Sparse Vertex Regression
Polar Deformable Attention
Instance Segmentation
🔎 Similar Papers
No similar papers found.
J
Jiacheng Sun
Shanghai University, Shanghai, China
J
Jiaqi Lin
Shanghai University, Shanghai, China
W
Wenlong Hu
Shanghai University, Shanghai, China
H
Haoyang Li
Shanghai University, Shanghai, China
X
Xinghong Zhou
Shanghai University, Shanghai, China
C
Chenghai Mao
Shanghai University, Shanghai, China
Yan Peng
Yan Peng
Professor, Shanghai University
Robotics
X
Xiaomao Li
Shanghai University, Shanghai, China