π€ AI Summary
Collaborative robots must proactively infer user goals early in interaction to minimize reliance on explicit instructions. This paper introduces an active task planning framework for embodied human-robot collaboration. Our method features: (i) a novel meta-prompting protocol enabling lightweight, scalable representation of user preferences and intentions; (ii) the first integration of large language modelsβ (LLMs) commonsense reasoning with forward-looking behavioral planning, supporting continuous personalization and context-aware task generation; and (iii) a unified architecture combining conditional LLM-based task planning, embodied interaction modeling, and user-study-driven evaluation. Evaluated on domestic manipulation tasks, our system reduces task completion time by 38.7% and decreases user instruction burden by 31.9%, while significantly improving perceived usefulness, usability, and reliability.
π Abstract
Collaborative robots must quickly adapt to their partner's intent and preferences to proactively identify helpful actions. This is especially true in situated settings where human partners can continually teach robots new high-level behaviors, visual concepts, and physical skills (e.g., through demonstration), growing the robot's capabilities as the human-robot pair work together to accomplish diverse tasks. In this work, we argue that robots should be able to infer their partner's goals from early interactions and use this information to proactively plan behaviors ahead of explicit instructions from the user. Building from the strong commonsense priors and steerability of large language models, we introduce ProVox ("Proactive Voice"), a novel framework that enables robots to efficiently personalize and adapt to individual collaborators. We design a meta-prompting protocol that empowers users to communicate their distinct preferences, intent, and expected robot behaviors ahead of starting a physical interaction. ProVox then uses the personalized prompt to condition a proactive language model task planner that anticipates a user's intent from the current interaction context and robot capabilities to suggest helpful actions; in doing so, we alleviate user burden, minimizing the amount of time partners spend explicitly instructing and supervising the robot. We evaluate ProVox through user studies grounded in household manipulation tasks (e.g., assembling lunch bags) that measure the efficiency of the collaboration, as well as features such as perceived helpfulness, ease of use, and reliability. Our analysis suggests that both meta-prompting and proactivity are critical, resulting in 38.7% faster task completion times and 31.9% less user burden relative to non-active baselines. Supplementary material, code, and videos can be found at https://provox-2025.github.io.