Deep Reinforcement Learning Agents are not even close to Human Intelligence

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Deep reinforcement learning (DRL) agents exhibit severe zero-shot generalization failure under task simplification, suffering over 70% average performance degradation—revealing critical overreliance on environmental shortcuts and a fundamental gap from human-like robust adaptability. This paper presents the first systematic evaluation of such generalization failures in simplified tasks and introduces HackAtari, a novel benchmark built upon the Arcade Learning Environment (ALE) that enables controllable, dynamic, and scalable systematic generalization assessment. Methodologically, we conduct comparative analysis across multiple algorithms (e.g., DQN, PPO) and neural architectures, demonstrating that standard training paradigms fail to induce human-like adaptivity. Our core contributions are threefold: (1) identifying task simplification as a pivotal generalization stressor; (2) establishing the first dynamic, systematically generalizable evaluation framework; and (3) providing a reproducible benchmark and diagnostic toolkit to advance DRL toward human-level adaptive intelligence.

Technology Category

Application Category

📝 Abstract

Deep reinforcement learning (RL) agents achieve impressive results in a wide variety of tasks, but they lack zero-shot adaptation capabilities. While most robustness evaluations focus on tasks complexifications, for which human also struggle to maintain performances, no evaluation has been performed on tasks simplifications. To tackle this issue, we introduce HackAtari, a set of task variations of the Arcade Learning Environments. We use it to demonstrate that, contrary to humans, RL agents systematically exhibit huge performance drops on simpler versions of their training tasks, uncovering agents' consistent reliance on shortcuts. Our analysis across multiple algorithms and architectures highlights the persistent gap between RL agents and human behavioral intelligence, underscoring the need for new benchmarks and methodologies that enforce systematic generalization testing beyond static evaluation protocols. Training and testing in the same environment is not enough to obtain agents equipped with human-like intelligence.

Problem

Research questions and friction points this paper is trying to address.

RL agents lack zero-shot adaptation to simplified tasks

Agents rely on shortcuts, failing in simpler task versions

Current methods miss benchmarks for human-like generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces HackAtari for task variation testing

Reveals RL agents' reliance on performance shortcuts

Advocates new benchmarks for systematic generalization

🔎 Similar Papers

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents