Robot Operation of Home Appliances by Reading User Manuals

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the challenge of enabling household service robots to generalize manipulation skills to novel domestic appliances. We propose a closed-loop symbolic modeling framework grounded in user manual parsing. Methodologically, we introduce the first integration of large vision-language models (VLMs) with updatable symbolic appliance models, leveraging manual-driven visual grounding, text-to-action policy parsing, and a closed-loop self-correction mechanism to infer multi-step, goal-directed manipulation strategies from unstructured textual manuals. Compared to end-to-end VLM-based control baselines, our approach achieves statistically significant improvements in task success rates (p < 0.01) on both simulated and real-world appliance tasks. Results demonstrate that structured symbolic representations critically enhance cross-device generalization and execution robustness. This work establishes a new paradigm for embodied agents to exploit prior knowledge encoded in technical documentation.

Technology Category

Application Category

📝 Abstract

Operating home appliances, among the most common tools in every household, is a critical capability for assistive home robots. This paper presents ApBot, a robot system that operates novel household appliances by"reading"their user manuals. ApBot faces multiple challenges: (i) infer goal-conditioned partial policies from their unstructured, textual descriptions in a user manual document, (ii) ground the policies to the appliance in the physical world, and (iii) execute the policies reliably over potentially many steps, despite compounding errors. To tackle these challenges, ApBot constructs a structured, symbolic model of an appliance from its manual, with the help of a large vision-language model (VLM). It grounds the symbolic actions visually to control panel elements. Finally, ApBot closes the loop by updating the model based on visual feedback. Our experiments show that across a wide range of simulated and real-world appliances, ApBot achieves consistent and statistically significant improvements in task success rate, compared with state-of-the-art large VLMs used directly as control policies. These results suggest that a structured internal representations plays an important role in robust robot operation of home appliances, especially, complex ones.

Problem

Research questions and friction points this paper is trying to address.

Robot operation of novel home appliances via manuals

Grounding textual policies to physical appliance controls

Executing multi-step policies robustly despite error accumulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses VLM to build symbolic appliance model

Visually grounds actions to control elements

Updates model via visual feedback loop

🔎 Similar Papers

Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant