RAPTR: Radar-based 3D Pose Estimation using Transformer

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the heavy reliance of radar-based indoor 3D human pose estimation on costly, labor-intensive fine-grained 3D keypoint annotations, this paper proposes a weakly supervised learning framework requiring only easily obtainable 3D bounding boxes and 2D keypoint labels. Methodologically, we design a two-stage pose decoder incorporating pseudo-3D deformable attention to fuse multi-view radar features, and introduce a 3D template loss and a 3D gravity loss to mitigate depth ambiguity. Evaluated on the HIBER and MMVR datasets, our method reduces joint position error by 34.3% and 76.9%, respectively, outperforming existing approaches significantly. To the best of our knowledge, this is the first work to systematically tackle weakly supervised 3D pose estimation in the radar modality, substantially lowering annotation overhead and advancing practical deployment.

Technology Category

Application Category

📝 Abstract

Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusions, or multiple people. In this paper, we propose extbf{RAPTR} (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels which are considerably easier and more scalable to collect. Our RAPTR is characterized by a two-stage pose decoder architecture with a pseudo-3D deformable attention to enhance (pose/joint) queries with multi-view radar features: a pose decoder estimates initial 3D poses with a 3D template loss designed to utilize the 3D BBox labels and mitigate depth ambiguities; and a joint decoder refines the initial poses with 2D keypoint labels and a 3D gravity loss. Evaluated on two indoor radar datasets, RAPTR outperforms existing methods, reducing joint position error by $34.3%$ on HIBER and $76.9%$ on MMVR. Our implementation is available at https://github.com/merlresearch/radar-pose-transformer.

Problem

Research questions and friction points this paper is trying to address.

Developing radar-based 3D human pose estimation with weak supervision

Reducing dependency on costly 3D keypoint labels in complex environments

Addressing depth ambiguities using multi-view radar features and losses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses weak supervision with 3D BBox and 2D keypoints

Employs two-stage pose decoder with deformable attention

Applies template and gravity losses to resolve ambiguities

🔎 Similar Papers

No similar papers found.