RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of long-horizon, high-safety-demand automation in robotic chemical experimentation. Methodologically, we propose a dual-loop framework integrating Vision-Language Models (VLMs) and Vision-Language-Action Models (VLAs). A VLM-driven tripartite role mechanism—comprising planning, guidance, and monitoring—is designed; it interfaces with an image-goal-conditioned VLA and incorporates semantic feedback to enable task decomposition, precise execution, and real-time regulatory compliance verification—particularly for hazardous, transparent, and deformable materials. Our key contribution lies in unifying autonomous experimental workflow execution with normative compliance monitoring. Evaluated on multi-step chemical procedures, our system achieves a 23.57% higher success rate and a 0.298 absolute improvement in compliance rate over state-of-the-art VLA baselines, demonstrating significantly enhanced cross-task and cross-object generalization.

Technology Category

Application Category

📝 Abstract
Robotic chemists promise to both liberate human experts from repetitive tasks and accelerate scientific discovery, yet remain in their infancy. Chemical experiments involve long-horizon procedures over hazardous and deformable substances, where success requires not only task completion but also strict compliance with experimental norms. To address these challenges, we propose extit{RoboChemist}, a dual-loop framework that integrates Vision-Language Models (VLMs) with Vision-Language-Action (VLA) models. Unlike prior VLM-based systems (e.g., VoxPoser, ReKep) that rely on depth perception and struggle with transparent labware, and existing VLA systems (e.g., RDT, pi0) that lack semantic-level feedback for complex tasks, our method leverages a VLM to serve as (1) a planner to decompose tasks into primitive actions, (2) a visual prompt generator to guide VLA models, and (3) a monitor to assess task success and regulatory compliance. Notably, we introduce a VLA interface that accepts image-based visual targets from the VLM, enabling precise, goal-conditioned control. Our system successfully executes both primitive actions and complete multi-step chemistry protocols. Results show 23.57% higher average success rate and a 0.298 average increase in compliance rate over state-of-the-art VLA baselines, while also demonstrating strong generalization to objects and tasks.
Problem

Research questions and friction points this paper is trying to address.

Long-horizon robotic chemical experimentation procedures
Safety compliance with hazardous and deformable substances
Integration of vision-language models for complex task execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-loop framework integrating VLMs and VLAs
VLM as planner, prompt generator, and monitor
Image-based visual targets for precise control
🔎 Similar Papers
No similar papers found.
Z
Zongzheng Zhang
Beijing Academy of Artificial Intelligence, BAAI
C
Chenghao Yue
Beijing Academy of Artificial Intelligence, BAAI
H
Haobo Xu
Institute for AI Industry Research (AIR), Tsinghua University
M
Minwen Liao
Beijing Academy of Artificial Intelligence, BAAI
X
Xianglin Qi
Beijing Academy of Artificial Intelligence, BAAI
Huan-ang Gao
Huan-ang Gao
Ph.D. student, Tsinghua University
AgentVision & Robotics
Z
Ziwei Wang
Nanyang Technological University
H
Hao Zhao
Institute for AI Industry Research (AIR), Tsinghua University