Deep Reinforcement Learning Agents are not even close to Human Intelligence

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) agents exhibit severe zero-shot generalization failure under task simplification, suffering over 70% average performance degradation—revealing critical overreliance on environmental shortcuts and a fundamental gap from human-like robust adaptability. This paper presents the first systematic evaluation of such generalization failures in simplified tasks and introduces HackAtari, a novel benchmark built upon the Arcade Learning Environment (ALE) that enables controllable, dynamic, and scalable systematic generalization assessment. Methodologically, we conduct comparative analysis across multiple algorithms (e.g., DQN, PPO) and neural architectures, demonstrating that standard training paradigms fail to induce human-like adaptivity. Our core contributions are threefold: (1) identifying task simplification as a pivotal generalization stressor; (2) establishing the first dynamic, systematically generalizable evaluation framework; and (3) providing a reproducible benchmark and diagnostic toolkit to advance DRL toward human-level adaptive intelligence.

Technology Category

Application Category

📝 Abstract
Deep reinforcement learning (RL) agents achieve impressive results in a wide variety of tasks, but they lack zero-shot adaptation capabilities. While most robustness evaluations focus on tasks complexifications, for which human also struggle to maintain performances, no evaluation has been performed on tasks simplifications. To tackle this issue, we introduce HackAtari, a set of task variations of the Arcade Learning Environments. We use it to demonstrate that, contrary to humans, RL agents systematically exhibit huge performance drops on simpler versions of their training tasks, uncovering agents' consistent reliance on shortcuts. Our analysis across multiple algorithms and architectures highlights the persistent gap between RL agents and human behavioral intelligence, underscoring the need for new benchmarks and methodologies that enforce systematic generalization testing beyond static evaluation protocols. Training and testing in the same environment is not enough to obtain agents equipped with human-like intelligence.
Problem

Research questions and friction points this paper is trying to address.

RL agents lack zero-shot adaptation to simplified tasks
Agents rely on shortcuts, failing in simpler task versions
Current methods miss benchmarks for human-like generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces HackAtari for task variation testing
Reveals RL agents' reliance on performance shortcuts
Advocates new benchmarks for systematic generalization
🔎 Similar Papers
No similar papers found.
Quentin Delfosse
Quentin Delfosse
AIML Lab Technische Universität Darmstadt
RoboticsArtificial IntelligenceOpen Ended LearningIntrinsic Motivation
J
Jannis Bluml
Department of Computer Science, Technical University Darmstadt, Germany; Hessian Center for Artificial Intelligence (hessian.AI)
Fabian Tatai
Fabian Tatai
Technische Universität Darmstadt
Cognitive ScienceIntuitive PhysicsDecision MakingVirtual RealityMotion Tracking
Théo Vincent
Théo Vincent
PhD student at IAS TU Darmstadt
reinforcement learning
B
Bjarne Gregori
Department of Computer Science, Technical University Darmstadt, Germany
E
Elisabeth Dillies
Sorbonne Université, Paris, France
J
Jan Peters
Department of Computer Science, Technical University Darmstadt, Germany; Hessian Center for Artificial Intelligence (hessian.AI); Centre for Cognitive Science, Darmstadt; German Research Center for Artificial Intelligence (DFKI)
Constantin A. Rothkopf
Constantin A. Rothkopf
Professor, Technical University Darmstadt & Adjunct Fellow, FIAS
computational cognitive sciencecognitive sciencecomputational psychologyperception and actionactive vision
Kristian Kersting
Kristian Kersting
Professor of AI & ML, Technical University of Darmstadt, Hessian.ai, DFKI, CAIRNE/ELLIS, AAAI Fellow
Artificial IntelligenceNeurosymbolic AIProbabilistic CircuitsMachine Learning