🤖 AI Summary
This work addresses the inefficiency, imprecise boundary localization, and limited reusability of human-in-the-loop information in dense temporal action annotation—challenges often stemming from passive tool responses. To overcome these limitations, we propose a closed-loop interactive annotation framework that, for the first time, integrates uncertainty-awareness, structured label propagation, and correction-driven adaptive mechanisms into temporal action segmentation. Our approach dynamically optimizes human–machine collaboration by combining boundary scribble supervision, local action proposal modeling, and cost-aware query planning. Experimental results and user studies demonstrate that, under equivalent annotation effort, the proposed method substantially improves boundary accuracy, overall annotation quality, and collaborative efficiency between annotators and the system.
📝 Abstract
Dense temporal annotation of procedural activity videos is vital for action understanding and embodied intelligence but remains labor-intensive due to reactive tools. Each correction is treated as an isolated edit, limiting reuse of information on annotator uncertainty and model reliability. We introduce IMPACT-Scribe, a correction-driven framework for dense labeling that uses each correction to improve future human-machine collaboration. IMPACT-Scribe combines uncertainty-aware boundary scribble supervision, local proposal modeling, cost-aware query planning, structured propagation, and correction-driven adaptation. Experiments and a human study show that this closed-loop design improves labeling quality per effort, enhances boundary accuracy, and fosters better human-machine interaction over time. The code will be made publicly available at https://github.com/BanzQians/IMPACT_AS.