🤖 AI Summary
Designing effective reward functions for reinforcement learning (RL) in real-world robotic manipulation remains challenging—sparse rewards hinder training efficiency, while dense rewards require extensive domain expertise. Method: This paper introduces ARCHIE, the first framework that leverages a large language model (GPT-4) to automatically generate end-to-end reward functions and success criteria directly from natural language task descriptions. ARCHIE compiles textual specifications into executable reward code, integrating vision-language model (VLM)-based perception, PPO/SAC-based RL, and sim-to-real transfer techniques. Contribution/Results: ARCHIE enables zero-shot, human-in-the-loop-free deployment of manipulation skills on single- and dual-arm robots. Evaluated on the ABB YuMi platform, it successfully trains and transfers diverse complex manipulation tasks from simulation to physical execution, substantially reducing reward engineering effort and achieving full automation—from text instruction to physical robot execution.
📝 Abstract
Recent advancements in Large Language Models (LLMs) and Visual Language Models (VLMs) have significantly impacted robotics, enabling high-level semantic motion planning applications. Reinforcement Learning (RL), a complementary paradigm, enables agents to autonomously optimize complex behaviors through interaction and reward signals. However, designing effective reward functions for RL remains challenging, especially in real-world tasks where sparse rewards are insufficient and dense rewards require elaborate design. In this work, we propose Autonomous Reinforcement learning for Complex HumanInformed Environments (ARCHIE), an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility. Additionally, GPT-4 automates the coding of task success criteria, creating a fully automated, one-shot procedure for translating human-readable text into deployable robot skills. Our approach is validated through extensive simulated experiments on single-arm and bi-manual manipulation tasks using an ABB YuMi collaborative robot, highlighting its practicality and effectiveness. Tasks are demonstrated on the real robot setup.