A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

231K/year
🤖 AI Summary
This work addresses the low sample efficiency and high computational overhead of reinforcement learning in decentralized partially observable Markov decision processes (Dec-POMDPs) by proposing and implementing HASE, a high-performance C++ engine. HASE employs data-oriented design, 64-byte cache-line alignment to eliminate false sharing, and a zero-copy PyTorch memory bridge, enabling efficient execution of mainstream algorithms such as PPO, DQN, and SAC. Experimental results demonstrate that HASE achieves a throughput of 33 million steps per second in single-agent settings—approximately 3,500 times faster than existing baselines—and completes multi-agent policy training within minutes. This substantial improvement significantly enhances the scalability and practical applicability of Dec-POMDPs.
📝 Abstract
Reinforcement Learning (RL) algorithms exhibit high sample complexity, particularly when applied to Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). As a response, projects such as SampleFactory, EnvPool, Brax, and IsaacLab migrate parallel execution of classic environments such as MuJoCo and Atari into C++ thread pools or the GPU to decrease the computational cost of environment steps. We are interested in optimizing the decision-level of human-AI joint operations, so we introduce a compute-efficient Dec-POMDP engine natively architected in C++ called Hide-And-Seek-Engine. By employing Data-Oriented Design (DOD) principles, explicit 64-byte cache-line alignment to remove false sharing, and a zero-copy PyTorch memory bridge using pinned memory and Direct Memory Access (DMA), our engine sustains throughput of up to 33,000,000 steps per second (SPS) in a single-agent, 1024-environment, decentralized observations on an AMD Ryzen 9950X (16 cores). Ten agents reduces FPS to 7M SPS with generating random actions contributing 1/3rd the total runtime for reference. The engine achieves a throughput increase of approximately 3,500$\times$ over the baseline single threaded vectorized NumPy implementation and successfully trains cooperative multi-agent policies via PPO, DQN, and SAC in minutes, validating both its performance and generality.
Problem

Research questions and friction points this paper is trying to address.

Dec-POMDP
sample complexity
multi-agent reinforcement learning
compute efficiency
high-throughput simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dec-POMDP
Data-Oriented Design
zero-copy memory bridge
high-throughput RL
multi-agent reinforcement learning
🔎 Similar Papers
2024-05-062024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)Citations: 2