Open-H-Embodiment: A Large-Scale Dataset for Enabling Foundation Models in Medical Robotics

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This work addresses the longstanding scarcity of large-scale, multi-platform, open-access high-quality datasets in medical robotics, which has hindered the development of foundational models. To bridge this gap, we present the largest open dataset to date of medical robot videos paired with synchronized kinematic data, encompassing 49 institutions and diverse surgical robot platforms, with a novel unified cross-platform integration framework. Leveraging this dataset, we introduce GR00T-H, the first open-source vision–language–action foundation model for medical robotics, achieving a 25% end-to-end success rate (versus 0% for baselines) on structured suturing tasks and completing on average 64% of a 29-step ex vivo suturing sequence. We also propose Cosmos-H, a unified world model supporting simulation and policy evaluation across nine robotic platforms.

Technology Category

Application Category

📝 Abstract
Autonomous medical robots hold promise to improve patient outcomes, reduce provider workload, democratize access to care, and enable superhuman precision. However, autonomous medical robotics has been limited by a fundamental data problem: existing medical robotic datasets are small, single-embodiment, and rarely shared openly, restricting the development of foundation models that the field needs to advance. We introduce Open-H-Embodiment, the largest open dataset of medical robotic video with synchronized kinematics to date, spanning more than 49 institutions and multiple robotic platforms including the CMR Versius, Intuitive Surgical's da Vinci, da Vinci Research Kit (dVRK), Rob Surgical BiTrack, Virtual Incision's MIRA, Moon Surgical Maestro, and a variety of custom systems, spanning surgical manipulation, robotic ultrasound, and endoscopy procedures. We demonstrate the research enabled by this dataset through two foundation models. GR00T-H is the first open foundation vision-language-action model for medical robotics, which is the only evaluated model to achieve full end-to-end task completion on a structured suturing benchmark (25% of trials vs. 0% for all others) and achieves 64% average success across a 29-step ex vivo suturing sequence. We also train Cosmos-H-Surgical-Simulator, the first action-conditioned world model to enable multi-embodiment surgical simulation from a single checkpoint, spanning nine robotic platforms and supporting in silico policy evaluation and synthetic data generation for the medical domain. These results suggest that open, large-scale medical robot data collection can serve as critical infrastructure for the research community, enabling advances in robot learning, world modeling, and beyond.
Problem

Research questions and friction points this paper is trying to address.

medical robotics
foundation models
robotic datasets
data scarcity
multi-embodiment
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation models
medical robotics
open dataset
multi-embodiment
world modeling
N
Nigel Nelson
NVIDIA
Juo-Tung Chen
Juo-Tung Chen
Johns Hopkins University
Robot LearningSurgical Robotics
J
Jesse Haworth
Johns Hopkins University
X
Xinhao Chen
Johns Hopkins University
L
Lukas Zbinden
NVIDIA
Dianye Huang
Dianye Huang
Technical University of Munich
robotic ultrasoundmedical robotintelligent controlhuman robot interaction
Alaa Eldin Abdelaal
Alaa Eldin Abdelaal
NSERC Postdoctoral Fellow, Stanford University
AutomationHuman-Robot InteractionSurgical Robotics
Alberto Arezzo
Alberto Arezzo
Department of Surgical Sciences, University of Torino
chirurgiaroboticssurgeryendoscopylaparoscopy
Ayberk Acar
Ayberk Acar
Computer Science Ph.D. Student, Vanderbilt University
Medical ImagingSurgical RoboticsExtended RealityComputer Vision
Farshid Alambeigi
Farshid Alambeigi
Associate Professor, University of Texas at Austin
Medical roboticsSurgical roboticsSurgical AutonomySurgineeringSoft robotics
C
Carlo Alberto Ammirati
University of Turin
Y
Yunke Ao
Balgrist University Hospital
P
Pablo David Aranda Rodriguez
ImFusion GmbH
Soofiyan Atar
Soofiyan Atar
Ph.D. Student at University of California San Diego
Surgical RoboticsBi-manipulationcomputer vision
M
Mattia Ballo
Sano Centre for Computational Medicine
N
Noah Barnes
Johns Hopkins University
F
Federica Barontini
University of Turin
F
Filip Binkiewicz
CMR Surgical
Peter Black
Peter Black
UBC
Urology
Sebastian Bodenstedt
Sebastian Bodenstedt
National Center for Tumor Diseases (NCT) Dresden
Leonardo Borgioli
Leonardo Borgioli
PhD Student, University of Illinois at Chicago
N
Nikola Budjak
ImFusion GmbH
B
Benjamin Calmé
University of Leeds