GET-USE: Learning Generalized Tool Usage for Bimanual Mobile Manipulation via Simulated Embodiment Extensions

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current robots lack the ability to generalize tool use across arbitrary objects, struggling to autonomously identify, select, and dexterously manipulate the most suitable tool from a diverse set—especially when the optimal tool is unavailable. To address this, we propose GeT-USE, a two-stage framework: first, embodied exploration in simulation constructs virtual end-effectors and distills task-oriented, geometry-based universal tool priors; second, these priors are transferred to a real-world bimanual mobile robot, where vision-driven motion policy learning enables cross-object tool selection and dexterous manipulation. Evaluated on a 22-DOF mobile dual-arm platform, GeT-USE achieves 30–60% higher success rates than state-of-the-art methods across three complex tool-use tasks. It is the first approach to realize end-to-end generalizable tool use—from geometric perception to embodied manipulation—without task-specific fine-tuning.

Technology Category

Application Category

📝 Abstract
The ability to use random objects as tools in a generalizable manner is a missing piece in robots' intelligence today to boost their versatility and problem-solving capabilities. State-of-the-art robotic tool usage methods focused on procedurally generating or crowd-sourcing datasets of tools for a task to learn how to grasp and manipulate them for that task. However, these methods assume that only one object is provided and that it is possible, with the correct grasp, to perform the task; they are not capable of identifying, grasping, and using the best object for a task when many are available, especially when the optimal tool is absent. In this work, we propose GeT-USE, a two-step procedure that learns to perform real-robot generalized tool usage by learning first to extend the robot's embodiment in simulation and then transferring the learned strategies to real-robot visuomotor policies. Our key insight is that by exploring a robot's embodiment extensions (i.e., building new end-effectors) in simulation, the robot can identify the general tool geometries most beneficial for a task. This learned geometric knowledge can then be distilled to perform generalized tool usage tasks by selecting and using the best available real-world object as tool. On a real robot with 22 degrees of freedom (DOFs), GeT-USE outperforms state-of-the-art methods by 30-60% success rates across three vision-based bimanual mobile manipulation tool-usage tasks.
Problem

Research questions and friction points this paper is trying to address.

Robots lack ability to use random objects as generalizable tools
Existing methods cannot identify best objects when multiple are available
Current approaches fail when optimal tools are absent from environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulated embodiment extensions for tool geometry learning
Transferring learned strategies to real-robot visuomotor policies
Selecting optimal real-world objects as tools via distillation