GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment

📅 2024-03-17
🏛️ IEEE/RJS International Conference on Intelligent RObots and Systems
📈 Citations: 7
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
In human-robot collaboration, inefficient goal and action alignment arises from information asymmetry between agents. Method: This paper proposes a goal-oriented mental alignment framework enabling embodied AI assistants to proactively initiate natural-language communication aligned with shared task objectives. We formulate spoken interaction as a planning problem that minimizes goal-relevant mental state misalignment, thereby enabling intention-driven, anticipatory language generation. The framework integrates embodied reasoning, goal-conditioned mental modeling, and context-aware language generation—bypassing reliance on end-to-end large language model outputs. Results: Evaluated on the Overcooked and VirtualHome benchmarks, our approach achieves significant improvements in collaborative task success rates. Human user studies further demonstrate substantial gains in perceived assistant trustworthiness and usability, validating the effectiveness of goal-aligned, cognitively grounded communication.

Technology Category

Application Category

📝 Abstract
Verbal communication plays a crucial role in human cooperation, particularly when the partners only have incomplete information about the task, environment, and each other’s mental state. In this paper, we propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA). GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents’ mental states that are relevant to the goals. This approach enables an embodied assistant to reason about when and how to proactively initialize communication with humans verbally using natural language to help achieve better cooperation. We evaluate our approach against strong baselines in two challenging environments, Overcooked (a multiplayer game) and VirtualHome (a household simulator). Our experimental results demonstrate that large language models struggle with generating meaningful communication that is grounded in the social and physical context. In contrast, our approach can successfully generate concise verbal communication for the embodied assistant to effectively boost the performance of the cooperation as well as human users’ perception of the assistant.
Problem

Research questions and friction points this paper is trying to address.

Natural Language Communication
AI Assistance
Human-AI Coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

GOMA
Shared Goals Communication
Human-AI Collaboration