Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction

๐Ÿ“… 2024-08-21
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address hard-exploration challenges in reinforcement learning, this paper proposes an object-centric hierarchical modeling framework: it represents environments as semantic units composed of objects and their attributes, constructs a discriminative object-level world model, and integrates count-based intrinsic rewards for efficient unsupervised exploration. The method jointly optimizes object representation learning, dual abstraction (state- and time-based), model-predictive control, and low-level policy execution. Its key innovation lies in the first realization of discriminative world model learning at the object levelโ€”enabling zero-shot transfer, long-horizon planning, and goal-directed decision-making. Experiments on 2D synthetic domains and MiniHack demonstrate substantial improvements over abstraction-free state-of-the-art methods and competing abstraction-based model-free and model-based approaches. Results validate strong performance in single-task solving, cross-environment transfer, and long-range planning.

Technology Category

Application Category

๐Ÿ“ Abstract
In the face of difficult exploration problems in reinforcement learning, we study whether giving an agent an object-centric mapping (describing a set of items and their attributes) allow for more efficient learning. We found this problem is best solved hierarchically by modelling items at a higher level of state abstraction to pixels, and attribute change at a higher level of temporal abstraction to primitive actions. This abstraction simplifies the transition dynamic by making specific future states easier to predict. We make use of this to propose a fully model-based algorithm that learns a discriminative world model, plans to explore efficiently with only a count-based intrinsic reward, and can subsequently plan to reach any discovered (abstract) states. We demonstrate the model's ability to (i) efficiently solve single tasks, (ii) transfer zero-shot and few-shot across item types and environments, and (iii) plan across long horizons. Across a suite of 2D crafting and MiniHack environments, we empirically show our model significantly out-performs state-of-the-art low-level methods (without abstraction), as well as performant model-free and model-based methods using the same abstraction. Finally, we show how to learn low level object-perturbing policies via reinforcement learning, and the object mapping itself by supervised learning.
Problem

Research questions and friction points this paper is trying to address.

Addressing difficult exploration problems in reinforcement learning
Studying object-centric mapping for efficient agent learning
Simplifying transition dynamics with higher-level state abstraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical object-centric mapping for state abstraction
Discriminative world model with count-based intrinsic reward
Supervised learning for object mapping and perturbation policies
๐Ÿ”Ž Similar Papers
No similar papers found.