Deep Reinforcement Learning via Object-Centric Attention

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) agents trained on raw pixel inputs suffer from poor generalization, susceptibility to spurious correlations, and sensitivity to background distractors. To address this, we propose a generic object-centric agent architecture that, for the first time, integrates cognitive-science-inspired object-centric inductive biases with a learnable attention masking mechanism, enabling end-to-end differentiable visual abstraction. Crucially, our approach requires no explicit symbolic representations, task-specific object segmentation modules, or predefined scene structure; instead, it autonomously identifies and attends to task-relevant visual entities while suppressing irrelevant background. The architecture is lightweight and transferable. Evaluated on the Atari benchmark, it significantly improves robustness against image perturbations, reduces sample complexity, and achieves performance competitive with or superior to state-of-the-art pixel-based DRL methods.

Technology Category

Application Category

📝 Abstract
Deep reinforcement learning agents, trained on raw pixel inputs, often fail to generalize beyond their training environments, relying on spurious correlations and irrelevant background details. To address this issue, object-centric agents have recently emerged. However, they require different representations tailored to the task specifications. Contrary to deep agents, no single object-centric architecture can be applied to any environment. Inspired by principles of cognitive science and Occam's Razor, we introduce Object-Centric Attention via Masking (OCCAM), which selectively preserves task-relevant entities while filtering out irrelevant visual information. Specifically, OCCAM takes advantage of the object-centric inductive bias. Empirical evaluations on Atari benchmarks demonstrate that OCCAM significantly improves robustness to novel perturbations and reduces sample complexity while showing similar or improved performance compared to conventional pixel-based RL. These results suggest that structured abstraction can enhance generalization without requiring explicit symbolic representations or domain-specific object extraction pipelines.
Problem

Research questions and friction points this paper is trying to address.

Improves generalization in deep reinforcement learning agents
Filters irrelevant visual information using object-centric attention
Reduces sample complexity while maintaining performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-Centric Attention via Masking (OCCAM)
Selectively preserves task-relevant entities
Reduces sample complexity and improves robustness
🔎 Similar Papers
No similar papers found.
J
Jannis Bluml
Department of Computer Science, Technical University of Darmstadt, Germany; Hessian Center for Artificial Intelligence (hessian.AI), Germany
Cedric Derstroff
Cedric Derstroff
PhD student, Technische Universität Darmstadt
Reinforcement Learning
B
Bjarne Gregori
Department of Computer Science, Technical University of Darmstadt, Germany
E
Elisabeth Dillies
Sorbonne Université, Paris, France
Quentin Delfosse
Quentin Delfosse
AIML Lab Technische Universität Darmstadt
RoboticsArtificial IntelligenceOpen Ended LearningIntrinsic Motivation
Kristian Kersting
Kristian Kersting
Professor of AI & ML, Technical University of Darmstadt, Hessian.ai, DFKI, CAIRNE/ELLIS, AAAI Fellow
Artificial IntelligenceNeurosymbolic AIProbabilistic CircuitsMachine Learning