Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing encoder designs for image-based deep reinforcement learning suffer from low parameter efficiency, poor generalization—especially across levels—and excessive reliance on large model capacity. Method: We propose Impoola-CNN, a lightweight CNN architecture derived from IMPALA-CNN, which replaces the final flattening operation with global average pooling (GAP) and incorporates ResNet-style residual connections. Contribution/Results: We empirically demonstrate that GAP significantly improves cross-level generalization on the Procgen benchmark and reduces translation sensitivity—challenging the “bigger is better” paradigm. Impoola-CNN achieves superior stability and sample efficiency with substantially fewer parameters. Notably, it outperforms larger models on non-ego-centric visual tasks, establishing a new design principle for compact, high-fidelity visual encoders in deep RL.

Technology Category

Application Category

📝 Abstract

As image-based deep reinforcement learning tackles more challenging tasks, increasing model size has become an important factor in improving performance. Recent studies achieved this by focusing on the parameter efficiency of scaled networks, typically using Impala-CNN, a 15-layer ResNet-inspired network, as the image encoder. However, while Impala-CNN evidently outperforms older CNN architectures, potential advancements in network design for deep reinforcement learning-specific image encoders remain largely unexplored. We find that replacing the flattening of output feature maps in Impala-CNN with global average pooling leads to a notable performance improvement. This approach outperforms larger and more complex models in the Procgen Benchmark, particularly in terms of generalization. We call our proposed encoder model Impoola-CNN. A decrease in the network's translation sensitivity may be central to this improvement, as we observe the most significant gains in games without agent-centered observations. Our results demonstrate that network scaling is not just about increasing model size - efficient network design is also an essential factor.

Problem

Research questions and friction points this paper is trying to address.

Improving image-based deep reinforcement learning performance

Exploring efficient network design for image encoders

Enhancing generalization in Procgen Benchmark tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces flattening with global average pooling

Improves performance on Procgen Benchmark

Reduces network translation sensitivity

🔎 Similar Papers

No similar papers found.