Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents

📅 2023-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of evaluating long-term memory capabilities of agents in partially observable dynamic environments. We propose the first memory evaluation framework designed for infinite-task horizons, introducing three 2D infinitely extensible, partially observable environments that shift focus from sample efficiency—typical in finite-horizon benchmarks—to sustained memory maintenance. Our method integrates reinforcement learning (via PPO) with sequence modeling architectures (Transformer-XL versus GRU) in a unified agent framework to systematically compare memory mechanisms. Results show that while Transformer-XL outperforms GRU on finite-horizon tasks (Mystery Path, Mortar Mayhem), GRU consistently and significantly surpasses Transformer-XL across all infinite-horizon settings. This reveals a fundamental advantage of recurrent architectures in long-term state tracking and memory stability. The framework establishes a new benchmark for memory-augmented agents and provides theoretical insights into architectural trade-offs for persistent memory.
📝 Abstract
Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark memory capabilities in decision-making agents. These environments, originally with finite tasks, are expanded into innovative, endless formats, mirroring the escalating challenges of cumulative memory games such as ``I packed my bag''. This progression in task design shifts the focus from merely assessing sample efficiency to also probing the levels of memory effectiveness in dynamic, prolonged scenarios. To address the gap in available memory-based Deep Reinforcement Learning baselines, we introduce an implementation that integrates Transformer-XL (TrXL) with Proximal Policy Optimization. This approach utilizes TrXL as a form of episodic memory, employing a sliding window technique. Our comparative study between the Gated Recurrent Unit (GRU) and TrXL reveals varied performances across different settings. TrXL, on the finite environments, demonstrates superior sample efficiency in Mystery Path and outperforms in Mortar Mayhem. However, GRU is more efficient on Searing Spotlights. Most notably, in all endless tasks, GRU makes a remarkable resurgence, consistently outperforming TrXL by significant margins. Website and Source Code: https://github.com/MarcoMeter/endless-memory-gym/
Problem

Research questions and friction points this paper is trying to address.

Dynamic Decision Making
Memory-intensive Tasks
Partial Observability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory Gym
Transformer-XL
Endless Tasks
🔎 Similar Papers
No similar papers found.
Marco Pleines
Marco Pleines
TU Dortmund University
Deep LearningReinforcement LearningTransformerRecurrent Neural Networks
M
M. Pallasch
Department of Computer Science, TU Dortmund University, Dortmund, 44227, Germany
F
Frank Zimmer
Department of Communication and Environment, Rhine-Waal University of Applied Sciences, Kamp-Lintfort, 47475, Germany
Mike Preuss
Mike Preuss
Universiteit Leiden
Artificial IntelligenceGamesChemAIOptimizationSocial Media Computing