BioProVLA-Agent: An Affordable, Protocol-Driven, Vision-Enhanced VLA-Enabled Embodied Multi-Agent System with Closed-Loop-Capable Reasoning for Biological Laboratory Manipulation

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses key challenges in automating biological experiments—namely, unstructured protocols, difficulties in recognizing transparent or reflective labware, and the lack of state awareness in multi-step procedures—by proposing the first protocol-centric embodied multi-agent framework. The system integrates protocol parsing, vision-based state verification, and action execution into a closed-loop workflow, enabling flexible and robust laboratory automation without reliance on expensive hardware or fixed pipelines. Core innovations include a customized LLM-based protocol agent, a VLM-RAG mechanism for visual state validation, a lightweight VLA policy for execution, and AugSmolVLA, an online visual augmentation method tailored to mitigate wet-lab visual interference. Evaluated on a benchmark comprising 15 atomic tasks, 6 composite workflows, and 3 bimanual tasks, AugSmolVLA substantially outperforms ACT, X-VLA, and the original SmolVLA, achieving higher precision in manipulating transparent objects and enhanced robustness under both normal and high-exposure imaging conditions.
📝 Abstract
Biological laboratory automation can reduce repetitive manual work and improve reproducibility, but reliable embodied execution in wet-lab environments remains challenging. Protocols are often unstructured, labware is frequently transparent or reflective, and multi-step procedures require state-aware execution beyond one-shot instruction following. Existing robotic systems often rely on costly hardware, fixed workflows, dedicated instruments, or robotics-oriented interfaces. Here, we introduce BioProVLA-Agent, an affordable, protocol-driven, vision-enhanced embodied multi-agent system enabled by Vision-Language-Action (VLA) models for biological manipulation. The system uses protocols as the task interface and integrates protocol parsing, visual state verification, and embodied execution in a closed-loop workflow. A Tailored LLM Protocol Agent converts protocols into verifiable subtasks; a VLM-RAG Verification Agent assesses readiness and completion using observations, robot states, retrieved knowledge, and success/failure examples; and a VLA Embodied Agent executes verified subtasks through a lightweight policy. To improve robustness under wet-lab visual perturbations, we develop AugSmolVLA, an online augmentation strategy targeting transparent labware, reflections, illumination shifts, and overexposure. We evaluate the system on a hierarchical benchmark covering 15 atomic tasks, 6 composite workflows, and 3 bimanual tasks, including tube loading, sorting, waste disposal, cap twisting, and liquid pouring. Across normal and high-exposure settings, AugSmolVLA improves execution stability over ACT, X-VLA, and the original SmolVLA, especially for precise placement, transparent-object manipulation, composite workflows, and visually degraded scenes. These results suggest a practical route toward accessible, protocol-centered, and verification-capable embodied AI for biological manipulation.
Problem

Research questions and friction points this paper is trying to address.

biological laboratory automation
embodied AI
vision-language-action
protocol-driven execution
wet-lab manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action (VLA)
protocol-driven automation
closed-loop verification
embodied AI
visual robustness augmentation
🔎 Similar Papers
No similar papers found.
Z
Zhaohui Du
Key Laboratory of Smart Manufacturing in Energy Chemical Process Ministry of Education, East China University of Science and Technology, Shanghai, CN.; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, CN.
Zhe Wang
Zhe Wang
Professor of Computer Science & Engineering, East China University of Science & Technology
Machine LearningPattern RecognitionMedical Data ProcessingImage AnalysisArtificial Intelligence
H
Hongmei Fei
School of Information Science and Technology, Shihezi University, Shihezi, CN.
X
Xiwen Cao
Key Laboratory of Smart Manufacturing in Energy Chemical Process Ministry of Education, East China University of Science and Technology, Shanghai, CN.; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, CN.
Ting Xiao
Ting Xiao
East China University of Science and Technology
Medical Image AnalysisFew-shot LearningReinforcement Learning
Qi Wang
Qi Wang
Shanghai Jiao Tong University << UCAS
Reinforcement LearningWorld ModelsComputer Vision
H
Huanbo Jin
Key Laboratory of Smart Manufacturing in Energy Chemical Process Ministry of Education, East China University of Science and Technology, Shanghai, CN.; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, CN.
Jiaming Gu
Jiaming Gu
Institute of Automation, Chinese Academy of Sciences
Computer Vision
Q
Quan Lu
Key Laboratory of Smart Manufacturing in Energy Chemical Process Ministry of Education, East China University of Science and Technology, Shanghai, CN.; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, CN.
Z
Zhe Liu
Key Laboratory of Smart Manufacturing in Energy Chemical Process Ministry of Education, East China University of Science and Technology, Shanghai, CN.; Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, CN.