Learning Skill-Attributes for Transferable Assessment in Video

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing sports skill assessment models suffer from poor generalizability and heavy reliance on large-scale, sport-specific annotated data. To address this, we propose CrossTrainer—the first transferable video representation framework that explicitly models cross-sport generic skill attributes (e.g., balance, control, hand positioning). Methodologically, CrossTrainer integrates self-supervised representation learning with multimodal language models, enabling skill attribute disentanglement and semantic alignment; it further supports fine-grained linguistic feedback generation and proficiency-level classification. Our key contribution is the novel introduction of transferable skill attributes into video-based skill assessment, breaking away from the conventional single-sport paradigm. Evaluated on multiple cross- and within-sport benchmarks, CrossTrainer achieves state-of-the-art performance, with up to 60% relative improvement over prior methods, significantly enhancing model generalizability and practical applicability.

Technology Category

Application Category

📝 Abstract
Skill assessment from video entails rating the quality of a person's physical performance and explaining what could be done better. Today's models specialize for an individual sport, and suffer from the high cost and scarcity of expert-level supervision across the long tail of sports. Towards closing that gap, we explore transferable video representations for skill assessment. Our CrossTrainer approach discovers skill-attributes, such as balance, control, and hand positioning -- whose meaning transcends the boundaries of any given sport, then trains a multimodal language model to generate actionable feedback for a novel video, e.g., "lift hands more to generate more power" as well as its proficiency level, e.g., early expert. We validate the new model on multiple datasets for both cross-sport (transfer) and intra-sport (in-domain) settings, where it achieves gains up to 60% relative to the state of the art. By abstracting out the shared behaviors indicative of human skill, the proposed video representation generalizes substantially better than an array of existing techniques, enriching today's multimodal large language models.
Problem

Research questions and friction points this paper is trying to address.

Developing transferable video representations for skill assessment across different sports
Discovering universal skill-attributes like balance and control that transcend specific sports
Generating actionable feedback and proficiency levels from video using multimodal language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discovers cross-sport skill-attributes like balance and control
Trains multimodal language model to generate actionable feedback
Validates model on cross-sport and intra-sport skill assessment
🔎 Similar Papers
No similar papers found.