MultiFormer: A Multi-Person Pose Estimation System Based on CSI and Attention Mechanism

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In CSI-based multi-person pose estimation, severe occlusion, inaccurate localization of highly mobile joints (e.g., wrists and elbows), and difficulty in modeling time-frequency features pose significant challenges. To address these, this paper proposes a non-intrusive multi-person pose estimation method based on a novel time-frequency dual-token Transformer. Its core contributions are: (1) the first time-frequency dual-token Transformer, jointly modeling the temporal dynamics and spectral structure of CSI signals; and (2) a multi-stage feature fusion network (MSFN) that deeply integrates CSI features with pose heatmaps while explicitly embedding anatomical constraints. Evaluated on the MM-Fi benchmark and a custom-built dataset, the method substantially outperforms state-of-the-art approaches—achieving a 12.7% average precision gain for highly mobile joints—demonstrating superior robustness and generalization capability.

Technology Category

Application Category

📝 Abstract
Human pose estimation based on Channel State Information (CSI) has emerged as a promising approach for non-intrusive and precise human activity monitoring, yet faces challenges including accurate multi-person pose recognition and effective CSI feature learning. This paper presents MultiFormer, a wireless sensing system that accurately estimates human pose through CSI. The proposed system adopts a Transformer based time-frequency dual-token feature extractor with multi-head self-attention. This feature extractor is able to model inter-subcarrier correlations and temporal dependencies of the CSI. The extracted CSI features and the pose probability heatmaps are then fused by Multi-Stage Feature Fusion Network (MSFN) to enforce the anatomical constraints. Extensive experiments conducted on on the public MM-Fi dataset and our self-collected dataset show that the MultiFormer achieves higher accuracy over state-of-the-art approaches, especially for high-mobility keypoints (wrists, elbows) that are particularly difficult for previous methods to accurately estimate.
Problem

Research questions and friction points this paper is trying to address.

Accurate multi-person pose recognition using CSI
Effective CSI feature learning with Transformer
Improved estimation for high-mobility keypoints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer based time-frequency dual-token feature extractor
Multi-head self-attention for CSI correlations
Multi-Stage Feature Fusion Network for constraints
🔎 Similar Papers
No similar papers found.
Y
Yanyi Qu
School of Information and Communication Engineering, University of Electronic Science and Technology of China
Haoyang Ma
Haoyang Ma
HKUST
random program generatorcompiler testingbug localization
W
Wenhui Xiong
National Key Laboratory of Science and Technology on Communications, UESTC