POPGym Arcade: Parallel Pixelated POMDPs

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in POMDP benchmarking: the lack of standardization, difficulties in evaluating agents in pixel-based environments, and low training efficiency. To this end, we introduce the first scalable, pixel-based POMDP benchmark supporting both fully and partially observable settings, with seven parallel tasks spanning three difficulty levels. Methodologically, we propose a Just-In-Time (JIT)-compiled, Podracer-inspired parallel architecture enabling hardware acceleration and high memory throughput; design pixel-level observation modeling with memory-augmented Q-learning; and pioneer memory saliency maps to visualize memory propagation within learned policies. Our contributions are: (1) the first open-source, extensible pixel-based POMDP benchmark; (2) empirical validation that memory mechanisms are decisive for generalization across tasks and observability conditions; and (3) substantial improvements in training speed and hardware utilization.

Technology Category

Application Category

📝 Abstract
We introduce POPGym Arcade, a benchmark consisting of 7 pixel-based environments each with three difficulties, utilizing a single observation and action space. Each environment offers both fully observable and partially observable variants, enabling counterfactual studies on partial observability. POPGym Arcade utilizes JIT compilation on hardware accelerators to achieve substantial speedups over CPU-bound environments. Moreover, this enables Podracer-style architectures to further increase hardware utilization and training speed. We evaluate memory models on our environments using a Podracer variant of Q learning, and examine the results. Finally, we generate memory saliency maps, uncovering how memories propagate through policies. Our library is available at https://github.com/bolt-research/popgym_arcade.
Problem

Research questions and friction points this paper is trying to address.

Benchmark for pixel-based environments with varying difficulties.
Enables counterfactual studies on partial observability in environments.
Utilizes JIT compilation for speedups and examines memory models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

JIT compilation on hardware accelerators
Podracer-style architectures for training
Memory saliency maps for policy analysis
🔎 Similar Papers
No similar papers found.