Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks

📅 2024-10-30

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the poor generalization of general-purpose physical control agents in reinforcement learning (RL), this paper introduces PhysGen: a framework that programmatically generates millions of 2D physics-based tasks to construct an open, unified RL environment space. It proposes Jax2D—a novel hardware-accelerated physics engine enabling efficient large-scale simulation—and designs a large-scale hybrid-quality pretraining paradigm integrating self-supervised RL with distributed training. Experiments demonstrate that agents pretrained under PhysGen achieve zero-shot generalization to unseen human-designed environments and attain state-of-the-art performance after minimal fine-tuning on tasks where standard RL methods fail entirely—significantly outperforming from-scratch baselines. This work constitutes the first empirical validation that pretraining on massive, procedurally generated physics tasks can yield general-purpose agents capable of cross-task physical reasoning.

Technology Category

Application Category

📝 Abstract

While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge. In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control. To this end, we introduce Kinetix: an open-ended space of physics-based RL environments that can represent tasks ranging from robotic locomotion and grasping to video games and classic RL environments, all within a unified framework. Kinetix makes use of our novel hardware-accelerated physics engine Jax2D that allows us to cheaply simulate billions of environment steps during training. Our trained agent exhibits strong physical reasoning capabilities in 2D space, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent *tabula rasa*. This includes solving some environments that standard RL training completely fails at. We believe this demonstrates the feasibility of large scale, mixed-quality pre-training for online RL and we hope that Kinetix will serve as a useful framework to investigate this further.

Problem

Research questions and friction points this paper is trying to address.

Training general agents for sequential decision problems

Procedurally generating physics-based tasks for RL training

Achieving zero-shot solving in unseen environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Procedurally generates millions of physics-based tasks

Uses hardware-accelerated physics engine Jax2D

Trains general RL agent for physical control

🔎 Similar Papers

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids