🤖 AI Summary
Existing action quality assessment (AQA) methods are limited to single-view competitive sports and RGB modalities, rendering them inadequate for risk identification and precise guidance in professional fitness training—especially for resistance-based exercises. To address this, we introduce the first large-scale, multimodal, multi-action dataset tailored for fitness AQA, comprising multi-view RGB videos, high-fidelity 3D pose sequences, surface electromyography (sEMG) signals, and physiological measurements. We pioneer the integration of sEMG into fitness AQA and propose a knowledge-graph-based penalized functional annotation scheme that enables fine-grained modeling of critical movement phases, error types, and corrective feedback. Experimental results demonstrate that multimodality, multi-view inputs, and structured annotations significantly improve assessment accuracy. This work establishes a new benchmark and technical foundation for AI-driven, evidence-based fitness coaching.
📝 Abstract
With the increasing awareness of health and the growing desire for aesthetic physique, fitness has become a prevailing trend. However, the potential risks associated with fitness training, especially with weight-loaded fitness actions, cannot be overlooked. Action Quality Assessment (AQA), a technology that quantifies the quality of human action and provides feedback, holds the potential to assist fitness enthusiasts of varying skill levels in achieving better training outcomes. Nevertheless, current AQA methodologies and datasets are limited to single-view competitive sports scenarios and RGB modality and lack professional assessment and guidance of fitness actions. To address this gap, we propose the FLEX dataset, the first multi-modal, multi-action, large-scale dataset that incorporates surface electromyography (sEMG) signals into AQA. FLEX utilizes high-precision MoCap to collect 20 different weight-loaded actions performed by 38 subjects across 3 different skill levels for 10 repetitions each, containing 5 different views of the RGB video, 3D pose, sEMG, and physiological information. Additionally, FLEX incorporates knowledge graphs into AQA, constructing annotation rules in the form of penalty functions that map weight-loaded actions, action keysteps, error types, and feedback. We conducted various baseline methodologies on FLEX, demonstrating that multimodal data, multiview data, and fine-grained annotations significantly enhance model performance. FLEX not only advances AQA methodologies and datasets towards multi-modal and multi-action scenarios but also fosters the integration of artificial intelligence within the fitness domain. Dataset and code are available at https://haoyin116.github.io/FLEX_Dataset.