🤖 AI Summary
Persistent acquisition of high-quality multi-person motion data remains challenging in large-scale, highly occluded environments. Method: This paper proposes a scalable motion capture paradigm based on Ultra-Wideband (UWB) localization. It presents the first systematic validation of UWB’s centimeter-level localization robustness under realistic museum conditions—supporting concurrent tracking of ≥4 individuals navigating naturally amid severe occlusions. The approach integrates UWB with eye-tracking, robot-mounted LiDAR/radar, and optical motion capture (serving as ground truth) to construct a multimodal dataset exceeding 130 minutes. Contribution/Results: The framework eliminates reliance on spatial constraints, frequent recalibration, and high costs inherent to conventional optical systems. Experimental results demonstrate its feasibility for large-scale field deployment, establishing a reliable, low-cost perception foundation for safe and efficient human-robot collaboration in expansive open environments such as warehouses and airports.
📝 Abstract
With robots increasingly integrating into human environments, understanding and predicting human motion is essential for safe and efficient interactions. Modern human motion and activity prediction approaches require high quality and quantity of data for training and evaluation, usually collected from motion capture systems, onboard or stationary sensors. Setting up these systems is challenging due to the intricate setup of hardware components, extensive calibration procedures, occlusions, and substantial costs. These constraints make deploying such systems in new and large environments difficult and limit their usability for in-the-wild measurements. In this paper we investigate the possibility to apply the novel Ultra-Wideband (UWB) localization technology as a scalable alternative for human motion capture in crowded and occlusion-prone environments. We include additional sensing modalities such as eye-tracking, onboard robot LiDAR and radar sensors, and record motion capture data as ground truth for evaluation and comparison. The environment imitates a museum setup, with up to four active participants navigating toward random goals in a natural way, and offers more than 130 minutes of multi-modal data. Our investigation provides a step toward scalable and accurate motion data collection beyond vision-based systems, laying a foundation for evaluating sensing modalities like UWB in larger and complex environments like warehouses, airports, or convention centers.