MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

In CSI-based multi-person pose estimation, severe occlusion, inaccurate localization of highly mobile joints (e.g., wrists and elbows), and difficulty in modeling time-frequency features pose significant challenges. To address these, this paper proposes a non-intrusive multi-person pose estimation method based on a novel time-frequency dual-token Transformer. Its core contributions are: (1) the first time-frequency dual-token Transformer, jointly modeling the temporal dynamics and spectral structure of CSI signals; and (2) a multi-stage feature fusion network (MSFN) that deeply integrates CSI features with pose heatmaps while explicitly embedding anatomical constraints. Evaluated on the MM-Fi benchmark and a custom-built dataset, the method substantially outperforms state-of-the-art approaches—achieving a 12.7% average precision gain for highly mobile joints—demonstrating superior robustness and generalization capability.

Technology Category

Application Category

📝 Abstract

Human pose estimation based on Channel State Information (CSI) has emerged as a promising approach for non-intrusive and precise human activity monitoring, yet faces challenges including accurate multi-person pose recognition and effective CSI feature learning. This paper presents MultiFormer, a wireless sensing system that accurately estimates human pose through CSI. The proposed system adopts a Transformer based time-frequency dual-token feature extractor with multi-head self-attention. This feature extractor is able to model inter-subcarrier correlations and temporal dependencies of the CSI. The extracted CSI features and the pose probability heatmaps are then fused by Multi-Stage Feature Fusion Network (MSFN) to enforce the anatomical constraints. Extensive experiments conducted on on the public MM-Fi dataset and our self-collected dataset show that the MultiFormer achieves higher accuracy over state-of-the-art approaches, especially for high-mobility keypoints (wrists, elbows) that are particularly difficult for previous methods to accurately estimate.

Problem

Research questions and friction points this paper is trying to address.

Accurate multi-person pose recognition using CSI

Effective CSI feature learning with Transformer

Improved estimation for high-mobility keypoints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer based time-frequency dual-token feature extractor

Multi-head self-attention for CSI correlations

Multi-Stage Feature Fusion Network for constraints

🔎 Similar Papers

No similar papers found.