Slot-MPC: Goal-Conditioned Model Predictive Control with Object-Centric Representations

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the limited generalization of traditional object-centric world models, which rely on reactive policies and struggle in novel environments. The authors propose a novel approach that integrates structured slot-based representations with gradient-based model predictive control (MPC). By leveraging a vision encoder to learn object-centric representations and constructing an action-conditioned dynamics model, the method enables online planning through differentiable rollouts. This is the first framework to combine differentiable object dynamics with MPC, achieving efficient and generalizable goal-directed control. Evaluated on simulated robotic manipulation tasks—particularly under limited offline data coverage—the approach significantly outperforms non-object-centric baselines, demonstrating marked improvements in both task success rate and planning efficiency.

📝 Abstract

Predictive world models enable agents to model scene dynamics and reason about the consequences of their actions. Inspired by human perception, object-centric world models capture scene dynamics using object-level representations, which can be used for downstream applications such as action planning. However, most object-centric world models and reinforcement learning (RL) approaches learn reactive policies that are fixed at inference time, limiting generalization to novel situations. We propose Slot-MPC, an object-centric world modeling framework that enables planning through Model Predictive Control (MPC). Slot-MPC leverages vision encoders to learn slot-based representations, which encode individual objects in the scene, and uses these structured representations to learn an action-conditioned object-centric dynamics model. At inference time, the learned dynamics model enables action planning via MPC, allowing agents to adapt to previously unseen situations. Since the learned world model is differentiable, we can use gradient-based MPC to directly optimize actions, which is computationally more efficient than relying on gradient-free, sampling-based MPC methods. Experiments on simulated robotic manipulation tasks show that Slot-MPC improves both task performance and planning efficiency compared to non-object-centric world model baselines. In the considered offline setting with limited state-action coverage, we find that gradient-based MPC performs better than gradient-free, sampling-based MPC. Our results demonstrate that explicitly structured, object-centric representations provide a strong inductive bias for controllable and generalizable decision-making. Code and additional results are available at https://slot-mpc.github.io.

Problem

Research questions and friction points this paper is trying to address.

object-centric representations

model predictive control

generalization

action planning

world models

Innovation

Methods, ideas, or system contributions that make the work stand out.

object-centric representation

model predictive control

differentiable dynamics model