From Strangers to Assistants: Fast Desire Alignment for Embodied Agent-User Adaptation

πŸ“… 2025-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Embodied agents struggle to rapidly infer ambiguous or implicit user needs in unfamiliar household environments. Method: This paper introduces HA-Desire, a value-driven simulation environment, and FAMER, a novel framework featuring a psychologically grounded β€œdesire”-based reasoning mechanism. FAMER integrates LLM-powered user modeling, desire-guided action filtering, reflective low-redundancy questioning, and goal-relevance-enhanced memory. It departs from conventional instruction-following paradigms to enable efficient intent inference and streamlined interaction. Contribution/Results: Experiments demonstrate that FAMER achieves precise alignment with users’ implicit goals within an average of three interaction rounds, significantly improves task success rates, reduces exploration steps by 37%, and sets new benchmarks in communication efficiency and environmental adaptability.

Technology Category

Application Category

πŸ“ Abstract
While embodied agents have made significant progress in performing complex physical tasks, real-world applications demand more than pure task execution. The agents must collaborate with unfamiliar agents and human users, whose goals are often vague and implicit. In such settings, interpreting ambiguous instructions and uncovering underlying desires is essential for effective assistance. Therefore, fast and accurate desire alignment becomes a critical capability for embodied agents. In this work, we first develop a home assistance simulation environment HA-Desire that integrates an LLM-driven human user agent exhibiting realistic value-driven goal selection and communication. The ego agent must interact with this proxy user to infer and adapt to the user's latent desires. To achieve this, we present a novel framework FAMER for fast desire alignment, which introduces a desire-based mental reasoning mechanism to identify user intent and filter desire-irrelevant actions. We further design a reflection-based communication module that reduces redundant inquiries, and incorporate goal-relevant information extraction with memory persistence to improve information reuse and reduce unnecessary exploration. Extensive experiments demonstrate that our framework significantly enhances both task execution and communication efficiency, enabling embodied agents to quickly adapt to user-specific desires in complex embodied environments.
Problem

Research questions and friction points this paper is trying to address.

Aligning agent actions with vague human desires
Interpreting ambiguous user instructions effectively
Reducing redundant inquiries for efficient collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven human user agent simulation
Desire-based mental reasoning mechanism
Reflection-based communication module design
πŸ”Ž Similar Papers
No similar papers found.
Yuanfei Wang
Yuanfei Wang
Peking University
robot learningreinforcement learning
X
Xinju Huang
School of Artificial Intelligence, Beijing Normal University
Fangwei Zhong
Fangwei Zhong
Beijing Normal University
Embodied AIRobot LearningMulti-Agent LearningComputer Vision
Y
Yaodong Yang
PKU-PsiBot Joint Lab, Institute for Artificial Intelligence, Peking University
Y
Yizhou Wang
School of Computer Science, Peking University, Nat'l Eng. Research Center of Visual Technology, Peking University, State Key Laboratory of General Artificial Intelligence, Peking University
Yuanpei Chen
Yuanpei Chen
South China University of Technology
Robotic
H
Hao Dong
School of Computer Science, Peking University