π€ AI Summary
To address the inefficiency of βstop-and-replanβ behavior in human-robot interaction caused by dynamic natural language instructions, this paper proposes a reasoning-based framework that jointly performs incremental language parsing and motion planning. Methodologically, it integrates multi-candidate syntactic analysis with symbolic semantic reasoning to resolve linguistic ambiguity online and update task goals in real time; further, it unifies incremental NLP, symbolic reasoning, and online motion planning into a cognitive closed-loop system capable of continuous refinement of end-effector pose, kinematic constraints, and high-level task objectives. Evaluated in realistic voice-controlled scenarios, the system enables seamless response to instruction revisions and dynamic environmental constraints, reducing trajectory adjustment latency by 62% and significantly improving collaborative fluency. The core contribution lies in achieving semantic-level co-evolution of language understanding and motion control, thereby transcending the limitations of conventional discrete instruction-processing paradigms.
π Abstract
Human-robot interaction requires robots to process language incrementally, adapting their actions in real-time based on evolving speech input. Existing approaches to language-guided robot motion planning typically assume fully specified instructions, resulting in inefficient stop-and-replan behavior when corrections or clarifications occur. In this paper, we introduce a novel reasoning-based incremental parser which integrates an online motion planning algorithm within the cognitive architecture. Our approach enables continuous adaptation to dynamic linguistic input, allowing robots to update motion plans without restarting execution. The incremental parser maintains multiple candidate parses, leveraging reasoning mechanisms to resolve ambiguities and revise interpretations when needed. By combining symbolic reasoning with online motion planning, our system achieves greater flexibility in handling speech corrections and dynamically changing constraints. We evaluate our framework in real-world human-robot interaction scenarios, demonstrating online adaptions of goal poses, constraints, or task objectives. Our results highlight the advantages of integrating incremental language understanding with real-time motion planning for natural and fluid human-robot collaboration. The experiments are demonstrated in the accompanying video at www.acin.tuwien.ac.at/42d5.