🤖 AI Summary
Robots exhibit limited dexterity in non-prehensile tool manipulation, particularly when operating in confined spaces or with unfamiliar tools. Method: This paper proposes an LLM-driven semantic-motor coordination framework. It introduces a novel tool affordance modeling approach and a stepwise manipulability controller, enabling the first closed-loop integration of LLM-based symbolic planning with vision-guided low-level motion control. The framework adopts a symbolic–subsymbolic hybrid architecture to support real-time mapping from natural language instructions to end-to-end action sequences. Contribution/Results: Experiments demonstrate strong generalization across diverse non-grasping tasks—including prying, pushing, and sweeping—under varying tool and environmental conditions. In constrained spaces, the system achieves significantly higher tool utilization success rates and improved robustness compared to prior methods. This work establishes a new paradigm for dexterous, open-world tool manipulation.
📝 Abstract
The ability to wield tools was once considered exclusive to human intelligence, but it's now known that many other animals, like crows, possess this capability. Yet, robotic systems still fall short of matching biological dexterity. In this paper, we investigate the use of Large Language Models (LLMs), tool affordances, and object manoeuvrability for non-prehensile tool-based manipulation tasks. Our novel method leverages LLMs based on scene information and natural language instructions to enable symbolic task planning for tool-object manipulation. This approach allows the system to convert the human language sentence into a sequence of feasible motion functions. We have developed a novel manoeuvrability-driven controller using a new tool affordance model derived from visual feedback. This controller helps guide the robot's tool utilization and manipulation actions, even within confined areas, using a stepping incremental approach. The proposed methodology is evaluated with experiments to prove its effectiveness under various manipulation scenarios.