SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds

📅 2026-01-13
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of human instance segmentation in dense crowd scenarios, where occlusions frequently lead to missing pose keypoints and degrade the performance of conventional methods. To mitigate this issue, the authors propose PoseMaskRefine, a fine-tuning strategy that leverages highly visible pose keypoints to guide iterative mask refinement in SAM 2.1. Remarkably, the approach achieves robust and accurate segmentation under heavy occlusion using as few as one to three keypoints. By integrating a lightweight encoder modification with a keypoint-driven iterative correction mechanism, PoseMaskRefine enhances segmentation performance across multiple benchmark datasets while preserving SAM’s inherent generalization capabilities and significantly reducing reliance on complete keypoint annotations.

Technology Category

Application Category

📝 Abstract
Segment Anything (SAM) provides an unprecedented foundation for human segmentation, but may struggle under occlusion, where keypoints may be partially or fully invisible. We adapt SAM 2.1 for pose-guided segmentation with minimal encoder modifications, retaining its strong generalization. Using a fine-tuning strategy called PoseMaskRefine, we incorporate pose keypoints with high visibility into the iterative correction process originally employed by SAM, yielding improved robustness and accuracy across multiple datasets. During inference, we simplify prompting by selecting only the three keypoints with the highest visibility. This strategy reduces sensitivity to common errors, such as missing body parts or misclassified clothing, and allows accurate mask prediction from as few as a single keypoint. Our results demonstrate that pose-guided fine-tuning of SAM enables effective, occlusion-aware human segmentation while preserving the generalization capabilities of the original model. The code and pretrained models will be available at https://mirapurkrabek.github.io/BBox-Mask-Pose/.
Problem

Research questions and friction points this paper is trying to address.

human instance segmentation
occlusion
pose-guided segmentation
crowd scenes
keypoint visibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

pose-guided segmentation
Segment Anything Model
occlusion-aware
PoseMaskRefine
human instance segmentation
🔎 Similar Papers
No similar papers found.
C
Constantin Kolomiiets
Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague
M
Miroslav Purkrabek
Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague
Jiri Matas
Jiri Matas
Professor, Czech Technical University
computer visionimage processingpattern recognitionmachine learning