🤖 AI Summary
Existing 2D-to-3D multi-person pose lifting methods often neglect inter-person relationships, struggling to handle variable numbers of people and occlusion scenarios. To address this limitation, this work proposes MuPPet, a novel framework that explicitly models interpersonal interactions for the first time in pose lifting. MuPPet integrates person encoding, permutation augmentation, and a dynamic multi-person attention mechanism to construct an end-to-end scalable model for 3D multi-person pose estimation. The method accommodates arbitrary numbers of input individuals and significantly outperforms current state-of-the-art single- and multi-person approaches across multiple datasets featuring complex group interactions. These results underscore the critical role of modeling interpersonal associations in enhancing both the accuracy and robustness of 3D pose estimation.
📝 Abstract
Multi-person social interactions are inherently built on coherence and relationships among all individuals within the group, making multi-person localization and body pose estimation essential to understanding these social dynamics. One promising approach is 2D-to-3D pose lifting which provides a 3D human pose consisting of rich spatial details by building on the significant advances in 2D pose estimation. However, the existing 2D-to-3D pose lifting methods often neglect inter-person relationships or cannot handle varying group sizes, limiting their effectiveness in multi-person settings. We propose MuPPet, a novel multi-person 2D-to-3D pose lifting framework that explicitly models inter-person correlations. To leverage these inter-person dependencies, our approach introduces Person Encoding to structure individual representations, Permutation Augmentation to enhance training diversity, and Dynamic Multi-Person Attention to adaptively model correlations between individuals. Extensive experiments on group interaction datasets demonstrate MuPPet significantly outperforms state-of-the-art single- and multi-person 2D-to-3D pose lifting methods, and improves robustness in occlusion scenarios. Our findings highlight the importance of modeling inter-person correlations, paving the way for accurate and socially-aware 3D pose estimation. Our code is available at: https://github.com/Thomas-Markhorst/MuPPet