Affordance-Aware Interactive Decision-Making and Execution for Ambiguous Instructions

๐Ÿ“… 2026-02-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

210K/year
๐Ÿค– AI Summary
This work addresses the challenge of enabling robots to efficiently identify task-relevant objects and execute actions in unfamiliar environments under ambiguous human instructions. To this end, the authors propose AIDE, a dual-stream framework that uniquely integrates interactive exploration with vision-language reasoning. By leveraging a multi-stage inference (MSI) stream and an accelerated decision-making (ADM) stream, AIDE achieves zero-shot functional perception and efficient closed-loop execution. The approach supports robust interpretation of vague commands and real-time environmental interaction, demonstrating over 80% task planning success and more than 95% closed-loop execution accuracy in both simulation and real-world settings, while operating at 10 Hzโ€”significantly outperforming existing vision-language model-based methods.

Technology Category

Application Category

๐Ÿ“ Abstract
Enabling robots to explore and act in unfamiliar environments under ambiguous human instructions by interactively identifying task-relevant objects (e.g., identifying cups or beverages for"I'm thirsty") remains challenging for existing vision-language model (VLM)-based methods. This challenge stems from inefficient reasoning and the lack of environmental interaction, which hinder real-time task planning and execution. To address this, We propose Affordance-Aware Interactive Decision-Making and Execution for Ambiguous Instructions (AIDE), a dual-stream framework that integrates interactive exploration with vision-language reasoning, where Multi-Stage Inference (MSI) serves as the decision-making stream and Accelerated Decision-Making (ADM) as the execution stream, enabling zero-shot affordance analysis and interpretation of ambiguous instructions. Extensive experiments in simulation and real-world environments show that AIDE achieves the task planning success rate of over 80\% and more than 95\% accuracy in closed-loop continuous execution at 10 Hz, outperforming existing VLM-based methods in diverse open-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

affordance-aware
ambiguous instructions
interactive decision-making
vision-language model
robotic execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

affordance-aware
interactive decision-making
vision-language reasoning
ambiguous instructions
zero-shot execution
๐Ÿ”Ž Similar Papers
No similar papers found.