🤖 AI Summary
This work addresses a critical limitation in existing approaches to learning from human corrections of robot behavior: the neglect of task-relevant information embedded in the timing of interventions. The paper introduces, for the first time, a systematic framework that treats human intervention timing as an independent learning signal. By integrating temporal analysis of interventions, trajectory preference modeling, and goal inference within a unified dynamic learning architecture, the method identifies key motion features that trigger corrections and rapidly infers underlying task intent. Experimental results demonstrate that the proposed approach significantly outperforms baseline methods in both correction-cause identification and goal inference, thereby validating the efficacy and potential of leveraging intervention timing as a novel dimension in human-robot collaborative learning.
📝 Abstract
Corrections offer a natural modality for people to provide feedback to a robot, by (i) intervening in the robot's behavior when they believe the robot is failing (or will fail) the task objectives and (ii) modifying the robot's behavior to successfully fulfill the task. Each correction offers information on what the robot should and should not do, where the corrected behavior is more aligned with task objectives than the original behavior. Most prior work on learning from corrections involves interpreting a correction as a new demonstration (consisting of the modified robot behavior), or a preference (for the modified trajectory compared to the robot's original behavior). However, this overlooks one essential element of the correction feedback, which is the human's decision to intervene in the robot's behavior in the first place. This decision can be influenced by multiple factors including the robot's task progress, alignment with human expectations, dynamics, motion legibility, and optimality. In this work, we investigate whether the timing of this decision can offer a useful signal for inferring these task-relevant influences. In particular, we investigate three potential applications for this learning signal: (1) identifying features of a robot's motion that may prompt people to correct it, (2) quickly inferring the final goal of a human's correction based on the timing and initial direction of their correction motion, and (3) learning more precise constraints for task objectives. Our results indicate that correction timing results in improved learning for the first two of these applications. Overall, our work provides new insights on the value of correction timing as a signal for robot learning.