AnyUser: Translating Sketched User Intent into Domestic Robots

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of enabling non-expert users to intuitively command domestic robots through freehand sketches and natural language, without requiring pre-built maps or programming. The authors propose AnyUser, a novel system that introduces a unified multimodal instruction framework integrating sketch and language inputs. It parses such unstructured commands into spatial-semantic primitives and employs a hierarchical policy to generate robust actions for task execution in unknown environments. By synergistically combining multimodal understanding, primitive extraction, and map-free execution, the approach accurately interprets diverse instructions in large-scale simulations and successfully performs real-world wiping and cleaning tasks on KUKA and Realman robotic platforms. User studies demonstrate task completion rates of 85.7%–96.4%, significantly enhancing usability and user satisfaction.

📝 Abstract

We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior maps or models. Novel components include multimodal fusion for understanding and a hierarchical policy for robust action generation. Efficacy is shown via extensive evaluations: (1) Quantitative benchmarks on the large-scale dataset showing high accuracy in interpreting diverse sketch-based commands across various simulated domestic scenes. (2) Real-world validation on two distinct robotic platforms, a statically mounted 7-DoF assistive arm (KUKA LBR iiwa) and a dual-arm mobile manipulator (Realman RMC-AIDAL), performing representative tasks like targeted wiping and area cleaning, confirming the system's ability to ground instructions and execute them reliably in physical environments. (3) A comprehensive user study involving diverse demographics (elderly, simulated non-verbal, low technical literacy) demonstrating significant improvements in usability and task specification efficiency, achieving high task completion rates (85.7%-96.4%) and user satisfaction. AnyUser bridges the gap between advanced robotic capabilities and the need for accessible non-expert interaction, laying the foundation for practical assistive robots adaptable to real-world human environments.

Problem

Research questions and friction points this paper is trying to address.

domestic robots

intuitive interaction

sketch-based instruction

non-expert users

multimodal input

Innovation

Methods, ideas, or system contributions that make the work stand out.

sketch-based instruction

multimodal fusion

spatial-semantic primitives

hierarchical policy

map-free robot execution

🔎 Similar Papers

No similar papers found.

Authors to Follow