π€ AI Summary
Embodied agents struggle to rapidly infer ambiguous or implicit user needs in unfamiliar household environments. Method: This paper introduces HA-Desire, a value-driven simulation environment, and FAMER, a novel framework featuring a psychologically grounded βdesireβ-based reasoning mechanism. FAMER integrates LLM-powered user modeling, desire-guided action filtering, reflective low-redundancy questioning, and goal-relevance-enhanced memory. It departs from conventional instruction-following paradigms to enable efficient intent inference and streamlined interaction. Contribution/Results: Experiments demonstrate that FAMER achieves precise alignment with usersβ implicit goals within an average of three interaction rounds, significantly improves task success rates, reduces exploration steps by 37%, and sets new benchmarks in communication efficiency and environmental adaptability.
π Abstract
While embodied agents have made significant progress in performing complex physical tasks, real-world applications demand more than pure task execution. The agents must collaborate with unfamiliar agents and human users, whose goals are often vague and implicit. In such settings, interpreting ambiguous instructions and uncovering underlying desires is essential for effective assistance. Therefore, fast and accurate desire alignment becomes a critical capability for embodied agents. In this work, we first develop a home assistance simulation environment HA-Desire that integrates an LLM-driven human user agent exhibiting realistic value-driven goal selection and communication. The ego agent must interact with this proxy user to infer and adapt to the user's latent desires. To achieve this, we present a novel framework FAMER for fast desire alignment, which introduces a desire-based mental reasoning mechanism to identify user intent and filter desire-irrelevant actions. We further design a reflection-based communication module that reduces redundant inquiries, and incorporate goal-relevant information extraction with memory persistence to improve information reuse and reduce unnecessary exploration. Extensive experiments demonstrate that our framework significantly enhances both task execution and communication efficiency, enabling embodied agents to quickly adapt to user-specific desires in complex embodied environments.