🤖 AI Summary
Embodied agents must understand and fulfill diverse, unanticipated human goals and preferences in open, dynamic environments. To address this, we propose the “Open-Universe Assistive Game” framework, enabling infinite scalability of goal space modeling and interactive goal acquisition. Methodologically, it integrates large language models’ (LLMs) dialogue comprehension capabilities with Bayesian inference, supporting online goal extraction and user-intent simulation via open-ended natural language interaction—without requiring large-scale offline annotation—and enabling dynamic goal learning with explicit uncertainty quantification. Evaluations on synthetic users in text-based shopping and AI2Thor household robotics environments demonstrate substantial improvements in goal tracking and task completion over baselines lacking explicit goal modeling. Both automated LLM-based assessment and human evaluation confirm the framework’s effectiveness and robustness.
📝 Abstract
Embodied AI agents must infer and act in an interpretable way on diverse human goals and preferences that are not predefined. To formalize this setting, we introduce Open-Universe Assistance Games (OU-AGs), a framework where the agent must reason over an unbounded and evolving space of possible goals. In this context, we introduce GOOD (GOals from Open-ended Dialogue), a data-efficient, online method that extracts goals in the form of natural language during an interaction with a human, and infers a distribution over natural language goals. GOOD prompts an LLM to simulate users with different complex intents, using its responses to perform probabilistic inference over candidate goals. This approach enables rich goal representations and uncertainty estimation without requiring large offline datasets. We evaluate GOOD in a text-based grocery shopping domain and in a text-operated simulated household robotics environment (AI2Thor), using synthetic user profiles. Our method outperforms a baseline without explicit goal tracking, as confirmed by both LLM-based and human evaluations.