FlashCap: Millisecond-Accurate Human Motion Capture via Flashing LEDs and Event-Based Vision

๐Ÿ“… 2026-03-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing approaches struggle to achieve millisecond-level, high-precision human motion capture in everyday settings due to high costs, substantial bandwidth requirements, and insufficient robustness under low-light conditions. This work proposes FlashCap, a novel system that integrates flickering LED markers with an event camera and fuses multimodal sensing from RGB, LiDAR, and IMU streams to enable motion capture with high temporal resolution. We introduce FlashMotion, the first millisecond-scale multimodal human motion dataset, and present ResPose, a residual learningโ€“based pose estimation algorithm. Experimental results demonstrate that ResPose reduces pose estimation error by approximately 40%, validating the effectiveness and innovation of both the FlashCap system and the FlashMotion dataset for high-frame-rate, high-accuracy pose estimation.

Technology Category

Application Category

๐Ÿ“ Abstract
Precise motion timing (PMT) is crucial for swift motion analysis. A millisecond difference may determine victory or defeat in sports competitions. Despite substantial progress in human pose estimation (HPE), PMT remains largely overlooked by the HPE community due to the limited availability of high-temporal-resolution labeled datasets. Today, PMT is achieved using high-speed RGB cameras in specialized scenarios such as the Olympic Games; however, their high costs, light sensitivity, bandwidth, and computational complexity limit their feasibility for daily use. We developed FlashCap, the first flashing LED-based MoCap system for PMT. With FlashCap, we collect a millisecond-resolution human motion dataset, FlashMotion, comprising the event, RGB, LiDAR, and IMU modalities, and demonstrate its high quality through rigorous validation. To evaluate the merits of FlashMotion, we perform two tasks: precise motion timing and high-temporal-resolution HPE. For these tasks, we propose ResPose, a simple yet effective baseline that learns residual poses based on events and RGBs. Experimental results show that ResPose reduces pose estimation errors by ~40% and achieves millisecond-level timing accuracy, enabling new research opportunities. The dataset and code will be shared with the community.
Problem

Research questions and friction points this paper is trying to address.

Precise Motion Timing
Human Pose Estimation
High-temporal-resolution Dataset
Motion Capture
Event-based Vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

event-based vision
flashing LEDs
millisecond-accurate motion capture
high-temporal-resolution pose estimation
multi-modal dataset
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zekai Wu
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University
S
Shuqi Fan
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University
M
Mengyin Liu
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University
Y
Yuhua Luo
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University
X
Xincheng Lin
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University
Ming Yan
Ming Yan
Xiamen University
Human Pose EstimationComputer VisionLiDARPoint CloudImbalanced Data
Junhao Wu
Junhao Wu
Towson university
Computer VisionCryo emMedical image
X
Xiuhong Lin
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University
Yuexin Ma
Yuexin Ma
Assistant Professor, School of Information Science and Technology, ShanghaiTech University
computer visionembodied AIautonomous driving
Chenglu Wen
Chenglu Wen
Professor of Xiamen University
3D visionpoint cloudsmobile mappingrobotics
L
Lan Xu
ShanghaiTech University
Siqi Shen
Siqi Shen
Xiamen University
Reinforcement Learning3D Vision
C
Cheng Wang
Fujian Key Laboratory of Urban Intelligent Sensing and Computing, Xiamen University; Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University