🤖 AI Summary
Prior work lacks systematic modeling and empirical data on the dynamic evolution of struggle behaviors during skill acquisition. Method: We introduce the first large-scale, multi-task video dataset comprising 4 activity categories and 18 tasks, featuring 5,385 finely annotated struggle segments collected from 76 participants across five practice rounds. We formalize struggle recognition as a temporal action localization task and propose a novel evaluation framework grounded in multi-stage annotation and cross-task generalization. Contribution/Results: Our method achieves 34.56% mAP under cross-task and 19.24% mAP under cross-activity settings, demonstrating the transferability of struggle representations and establishing the first benchmark for this task. This work provides both foundational data and a technical paradigm for learning-state-aware and adaptive instructional systems.
📝 Abstract
The ability to determine when a person struggles during skill acquisition is crucial for both optimizing human learning and enabling the development of effective assistive systems. As skills develop, the type and frequency of struggles tend to change, and understanding this evolution is key to determining the user's current stage of learning. However, existing manipulation datasets have not focused on how struggle evolves over time. In this work, we collect a dataset for struggle determination, featuring 61.68 hours of video recordings, 2,793 videos, and 5,385 annotated temporal struggle segments collected from 76 participants. The dataset includes 18 tasks grouped into four diverse activities -- tying knots, origami, tangram puzzles, and shuffling cards, representing different task variations. In addition, participants repeated the same task five times to capture their evolution of skill. We define the struggle determination problem as a temporal action localization task, focusing on identifying and precisely localizing struggle segments with start and end times. Experimental results show that Temporal Action Localization models can successfully learn to detect struggle cues, even when evaluated on unseen tasks or activities. The models attain an overall average mAP of 34.56% when generalizing across tasks and 19.24% across activities, indicating that struggle is a transferable concept across various skill-based tasks while still posing challenges for further improvement in struggle detection. Our dataset is available at https://github.com/FELIXFENG2019/EvoStruggle.