RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

📅 2026-03-04

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the current lack of reproducible, large-scale benchmarks for systematically evaluating the multitask capabilities of general-purpose robots in everyday human environments. To this end, the authors introduce a high-fidelity simulation benchmark for home mobile manipulation built on the RoboCasa platform, encompassing 365 daily tasks and 2,500 diverse kitchen scenes, integrating both real human demonstrations and synthetic data. The benchmark supports research in multitask learning, robot foundation model training, and continual learning. Through extensive experiments, the study provides the first quantitative analysis of how task diversity, data scale, and environmental variation critically influence the generalization of generalist policies, thereby establishing a systematic evaluation framework and offering clear guidance for future research in general-purpose robotics.

Technology Category

Application Category

📝 Abstract

Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present RoboCasa365, a comprehensive simulation benchmark for household mobile manipulation. Built on the RoboCasa platform, RoboCasa365 introduces 365 everyday tasks across 2,500 diverse kitchen environments, with over 600 hours of human demonstration data and over 1600 hours of synthetically generated demonstration data -- making it one of the most diverse and large-scale resources for studying generalist policies. RoboCasa365 is designed to support systematic evaluations for different problem settings, including multi-task learning, robot foundation model training, and lifelong learning. We conduct extensive experiments on this benchmark with state-of-the-art methods and analyze the impacts of task diversity, dataset scale, and environment variation on generalization. Our results provide new insights into what factors most strongly affect the performance of generalist robots and inform strategies for future progress in the field.

Problem

Research questions and friction points this paper is trying to address.

generalist robots

benchmark

robot learning

household tasks

simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

simulation benchmark

generalist robots

mobile manipulation