LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific embodied agents for laboratory automation are hindered by the absence of high-fidelity simulators and systematic evaluation benchmarks. To address this, we introduce LabSim—the first laboratory simulator supporting multi-physics coupling and chemistry-aware semantic interaction; LabScene—a scalable, procedural scientific scene generator; and LabBench—a five-level hierarchical benchmark spanning atomic actions to long-horizon operations. Integrating multi-physics engine modeling, hierarchical task abstraction, and embodied reasoning evaluation techniques, our framework enables large-scale training and rigorous assessment of perception-planning-control joint capabilities across 30 scientific tasks, 200+ instruments, and diverse scene assets. Experimental results demonstrate substantial improvements in model generalization and interpretability, establishing a foundational infrastructure for developing and evaluating scientific embodied intelligence.

Technology Category

Application Category

📝 Abstract
Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.
Problem

Research questions and friction points this paper is trying to address.

Lack of high-fidelity simulator for scientific embodied agents
Absence of hierarchical benchmarks for lab environment tasks
Need for scalable procedural generation of diverse scientific scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-fidelity simulator for multi-physics interactions
Scalable procedural generator for diverse scenes
Hierarchical benchmark with five complexity levels
🔎 Similar Papers
No similar papers found.