๐ค AI Summary
Existing research lacks a structured, multi-objective evaluation benchmark for bimodal (sweeping + grasping) mobile cleaning robots, leading to a disconnect between academic development and real-world deployment.
Method: This paper introduces the first embodied intelligence benchmark for cleaning tasks, built on NVIDIA Isaac Sim. It integrates a suction-sweep module and a 6-DOF robotic arm, supporting both hand-crafted and procedurally generated environments. We propose a reproducible and extensible evaluation framework assessing four dimensions: task completion, spatial efficiency, motion quality, and control performance. Additionally, we design heuristic and map-based planning baseline agents to enable systematic evaluationโfrom skill-level execution to full-scene operation.
Contribution/Results: The benchmark fills a critical gap in bimodal cleaning robot evaluation, enabling rigorous, standardized assessment of algorithmic robustness and generalization across diverse cleaning scenarios.
๐ Abstract
Embodied AI benchmarks have advanced navigation, manipulation, and reasoning, but most target complex humanoid agents or large-scale simulations that are far from real-world deployment. In contrast, mobile cleaning robots with dual mode capabilities, such as sweeping and grasping, are rapidly emerging as realistic and commercially viable platforms. However, no benchmark currently exists that systematically evaluates these agents in structured, multi-target cleaning tasks, revealing a critical gap between academic research and real-world applications. We introduce CleanUpBench, a reproducible and extensible benchmark for evaluating embodied agents in realistic indoor cleaning scenarios. Built on NVIDIA Isaac Sim, CleanUpBench simulates a mobile service robot equipped with a sweeping mechanism and a six-degree-of-freedom robotic arm, enabling interaction with heterogeneous objects. The benchmark includes manually designed environments and one procedurally generated layout to assess generalization, along with a comprehensive evaluation suite covering task completion, spatial efficiency, motion quality, and control performance. To support comparative studies, we provide baseline agents based on heuristic strategies and map-based planning. CleanUpBench bridges the gap between low-level skill evaluation and full-scene testing, offering a scalable testbed for grounded, embodied intelligence in everyday settings.