🤖 AI Summary
This work addresses the limited generalization of existing model-based reinforcement learning methods in environments where irrelevant factors—such as texture or color—vary despite structural similarity. Inspired by human cognition’s ability to segment sensory streams into discrete events, the authors propose a General Event-Aware World Model framework. This framework employs an unsupervised automatic event generator and a General Event Segmenter (GES) to detect event boundaries without manual annotations, while leveraging event prediction to shape the representation space and guide the policy toward critical spatiotemporal transitions. The approach is annotation-free and compatible with diverse world model architectures. Evaluated on standard benchmarks including Atari 100K, Craftax 1M, and DeepMind Control, it achieves state-of-the-art performance, surpassing strong baselines by 10%–45%.
📝 Abstract
While model-based reinforcement learning (MBRL) improves sample efficiency by learning world models from raw observations, existing methods struggle to generalize across structurally similar scenes and remain vulnerable to spurious variations such as textures or color shifts. From a cognitive science perspective, humans segment continuous sensory streams into discrete events and rely on these key events for decision-making. Motivated by this principle, we propose the Event-Aware World Model (EAWM), a general framework that learns event-aware representations to streamline policy learning without requiring handcrafted labels. EAWM employs an automated event generator to derive events from raw observations and introduces a Generic Event Segmentor (GES) to identify event boundaries, which mark the start and end time of event segments. Through event prediction, the representation space is shaped to capture meaningful spatio-temporal transitions. Beyond this, we present a unified formulation of seemingly distinct world model architectures and show the broad applicability of our methods. Experiments on Atari 100K, Craftax 1M, and DeepMind Control 500K, DMC-GB2 500K demonstrate that EAWM consistently boosts the performance of strong MBRL baselines by 10%-45%, setting new state-of-the-art results across benchmarks. Our code is released at https://github.com/MarquisDarwin/EAWM.