🤖 AI Summary
Current reinforcement learning lacks a standardized benchmark for evaluating agent memory capabilities—particularly in partially observable desktop robotic manipulation tasks, where memory modeling remains unassessed. To address this gap, we propose MIKASA, the first taxonomy of memory-intensive RL tasks explicitly designed for spatiotemporal dependency modeling. We introduce two unified evaluation suites: MIKASA-Base (for general-purpose scenarios) and MIKASA-Robo (comprising 32 simulated robotic memory tasks). Grounded in POMDP formalism, MIKASA integrates memory architectures—including LSTM and Transformer modules—into physics-based simulators (e.g., PyBullet), enabling cross-algorithm, cross-task comparative assessment of memory efficacy. Empirical evaluation reveals critical performance bottlenecks of existing memory-augmented methods in realistic robotic manipulation settings. MIKASA thus establishes a rigorous, reproducible benchmark for robust memory modeling and identifies concrete directions for advancement in memory-aware RL.
📝 Abstract
Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent's memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base - a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo - a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our contributions establish a unified framework for advancing memory RL research, driving the development of more reliable systems for real-world applications. The code is available at https://sites.google.com/view/memorybenchrobots/.