๐ค AI Summary
This work addresses the limited understanding of athletic intent in existing egocentric video datasets, which predominantly focus on action recognition and lack support for high-speed sports scenarios. To bridge this gap, we introduce motion intent recognition into egocentric fast-motion video analysis for the first time. We propose a foundation modelโbased approach for real-time focus-of-motion identification that infers the wearerโs intent through camera pose estimation. By integrating a sliding-batch inference mechanism with system-level optimizations, our method achieves real-time performance with low memory consumption on a newly curated dataset. The resulting framework is deployable on edge devices, offering an efficient and practical solution for intent-aware analysis in motion-intensive environments.
๐ Abstract
From Vision-Language-Action (VLA) systems to robotics, existing egocentric datasets primarily focus on action recognition tasks, while largely overlooking the inherent role of motion analysis in sports and other fast-movement scenarios. To bridge this gap, we propose a real-time motion focus recognition method that estimates the subject's locomotion intention from any egocentric video. We leverage the foundation model for camera pose estimation and introduce system-level optimizations to enable efficient and scalable inference. Evaluated on a collected egocentric action dataset, our method achieves real-time performance with manageable memory consumption through a sliding batch inference strategy. This work makes motion-centric analysis practical for edge deployment and offers a complementary perspective to existing egocentric studies on sports and fast-movement activities.