RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing robotic memory benchmarks suffer from insufficient multimodal annotations, narrow task coverage, simplistic structures, and a lack of real-world evaluation. To address these limitations, this work introduces RoboMemArena—a large-scale, long-horizon memory benchmark comprising 26 tasks—and proposes PrediMem, a dual-system vision-language-action model. PrediMem integrates a high-level vision-language model (VLM) planner with a predictive coding mechanism to enable structured subtask decomposition, automatic keyframe annotation, and efficient memory management. Experiments demonstrate that PrediMem significantly outperforms baseline methods on complex trajectories averaging over 1,000 steps, with 68.9% of subtasks relying on memory retrieval, thereby validating its effectiveness and scalability in handling intricate memory-intensive tasks in real-world scenarios.

📝 Abstract

Memory is a critical component of robotic intelligence, as robots must rely on past observations and actions to accomplish long-horizon tasks in partially observable environments. However, existing robotic memory benchmarks still lack multimodal annotations for memory formation, provide limited task coverage and structural complexity, and remain restricted to simulation without real-world evaluation. We address this gap with RoboMemArena, a large-scale benchmark of 26 tasks, with average trajectory lengths exceeding 1,000 steps per task and 68.9% of subtasks being memory-dependent. The generation pipeline leverages a vision-language model (VLM) to design and compose subtasks, generates full trajectories through atomic functions, and provides memory-related annotations, including subtask instructions and native keyframe annotations, while paired real-world memory tasks support physical evaluation. We further design PrediMem, a dual-system VLA in which a high-level VLM planner manages a memory bank with recent and keyframe buffers and uses a predictive coding head to improve sensitivity to task dynamics. Extensive experiments on RoboMemArena show that PrediMem outperforms all baselines and provides insights into memory management, model architecture, and scaling laws for complex memory systems.

Problem

Research questions and friction points this paper is trying to address.

robotic memory

memory benchmark

partially observable environments

long-horizon tasks

real-world evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

robotic memory benchmark

vision-language model

memory-dependent tasks

predictive coding

real-world evaluation

🔎 Similar Papers

Task-unaware Lifelong Robot Learning with Retrieval-based Weighted Local Adaptation

2024-10-03arXiv.orgCitations: 0

💼 Related Jobs

AI Research Scientist, Robotics