Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

📅 2024-08-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address hard-exploration challenges in reinforcement learning, this paper proposes an object-centric hierarchical modeling framework: it represents environments as semantic units composed of objects and their attributes, constructs a discriminative object-level world model, and integrates count-based intrinsic rewards for efficient unsupervised exploration. The method jointly optimizes object representation learning, dual abstraction (state- and time-based), model-predictive control, and low-level policy execution. Its key innovation lies in the first realization of discriminative world model learning at the object level—enabling zero-shot transfer, long-horizon planning, and goal-directed decision-making. Experiments on 2D synthetic domains and MiniHack demonstrate substantial improvements over abstraction-free state-of-the-art methods and competing abstraction-based model-free and model-based approaches. Results validate strong performance in single-task solving, cross-environment transfer, and long-range planning.

Technology Category

Application Category

📝 Abstract

In the face of difficult exploration problems in reinforcement learning, we study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning. We found this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels, and attribute change at a higher level of temporal abstraction to primitive actions. This abstraction simplifies the transition dynamic by making specific future states easier to predict. We make use of this to propose a fully model-based algorithm that learns a discriminative world model, plans to explore efficiently with only a count-based intrinsic reward, and can subsequently plan to reach any discovered (abstract) states. We demonstrate the model's ability to (i) efficiently solve single tasks, (ii) transfer zero-shot and few-shot across item types and environments, and (iii) plan across long horizons. Across a suite of 2D crafting and MiniHack environments, we empirically show our model significantly out-performs state-of-the-art low-level methods (without abstraction), as well as performant model-free and model-based methods using the same abstraction. Finally, we show how to learn low level object-perturbing policies via reinforcement learning, and the object mapping itself by supervised learning.

Problem

Research questions and friction points this paper is trying to address.

Addressing difficult exploration problems in reinforcement learning

Studying object-centric mapping for efficient agent learning

Simplifying transition dynamics with higher-level state abstraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical object-centric mapping for state abstraction

Discriminative world model with count-based intrinsic reward

Supervised learning for object mapping and perturbation policies

🔎 Similar Papers

Open-World Object Detection with Instance Representation Learning