Object-Centric World Models for Causality-Aware Reinforcement Learning

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing world models struggle to model high-dimensional, non-stationary environments with strong multi-object interactions, as they rely on holistic scene representations rather than object-level decomposition. To address this, we propose STICA: a Slot Transformer–based framework that decomposes visual observations into object-centric token sequences and fuses action and reward tokens into the sequence. Crucially, STICA introduces causal attention—its first application in world modeling—enabling token-level causal reasoning and thereby enhancing interpretability and decision efficiency of downstream policy and value networks. Experiments on complex multi-object interaction tasks demonstrate that STICA significantly outperforms state-of-the-art methods, achieving substantial improvements in both sample efficiency and asymptotic performance.

Technology Category

Application Category

📝 Abstract

World models have been developed to support sample-efficient deep reinforcement learning agents. However, it remains challenging for world models to accurately replicate environments that are high-dimensional, non-stationary, and composed of multiple objects with rich interactions since most world models learn holistic representations of all environmental components. By contrast, humans perceive the environment by decomposing it into discrete objects, facilitating efficient decision-making. Motivated by this insight, we propose emph{Slot Transformer Imagination with CAusality-aware reinforcement learning} (STICA), a unified framework in which object-centric Transformers serve as the world model and causality-aware policy and value networks. STICA represents each observation as a set of object-centric tokens, together with tokens for the agent action and the resulting reward, enabling the world model to predict token-level dynamics and interactions. The policy and value networks then estimate token-level cause--effect relations and use them in the attention layers, yielding causality-guided decision-making. Experiments on object-rich benchmarks demonstrate that STICA consistently outperforms state-of-the-art agents in both sample efficiency and final performance.

Problem

Research questions and friction points this paper is trying to address.

Developing object-centric world models for high-dimensional multi-object environments

Enabling causality-aware reinforcement learning through token-level dynamics prediction

Improving sample efficiency and performance in object-rich interactive environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-centric Transformers model world dynamics

Token-level cause-effect relations guide decisions

Causality-aware policy networks enhance reinforcement learning

🔎 Similar Papers

No similar papers found.