Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

In human-robot collaboration, human goals are often ambiguous, difficult to articulate, and open-ended; existing approaches are constrained by predefined goal sets or unimodal inputs, resulting in poor generalization. This paper introduces BALI, a bidirectional action-language inference framework that jointly leverages action trajectories and natural language cues to construct a sliding temporal-window planning tree, enabling coupled action–language reasoning. BALI dynamically decides—based on information gain—whether to actively query the user or execute assistive actions, thereby relaxing the restrictive closed-world goal assumption. The framework supports open-domain goal inference without requiring goal enumeration. Evaluated on collaborative cooking tasks, BALI significantly reduces misclassification rates and enhances prediction stability, particularly excelling in zero-shot generalization to unseen goals.

Technology Category

Application Category

📝 Abstract

To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.

Problem

Research questions and friction points this paper is trying to address.

Infer ambiguous human goals for collaboration

Integrate language and actions for robust prediction

Enable robots to handle novel, unbounded goals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates language and action cues for goal inference

Asks clarifying questions based on expected information gain

Selects supportive actions aligned with inferred goals

🔎 Similar Papers

Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task