Sci-VLA: Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited compositional generalization of current vision–language–action (VLA) models when executing long-horizon scientific experiments composed of multiple atomic tasks, particularly their inability to perform transitional operations between unseen task sequences. To overcome this, we propose a plug-in agent based on a large language model (LLM) that dynamically generates transition-action code at inference time—without any additional training—to guide the VLA model through novel experimental procedures. Our approach introduces, for the first time, an inference-time agent intervention mechanism that substantially enhances compositional generalization. Evaluated in a newly constructed scientific experiment simulation environment, the method improves the average success rate of atomic tasks by 42% and achieves zero-shot transfer from simulation to a real-world laboratory setting.

Technology Category

Application Category

📝 Abstract
Robotic laboratories play a critical role in autonomous scientific discovery by enabling scalable, continuous experimental execution. Recent vision-language-action (VLA) models offer a promising foundation for robotic laboratories. However, scientific experiments typically involve long-horizon tasks composed of multiple atomic tasks, posing a fundamental challenge to existing VLA models. While VLA models fine-tuned for scientific tasks can reliably execute atomic experimental actions seen during training, they often fail to perform composite tasks formed by reordering and composing these known atomic actions. This limitation arises from a distributional mismatch between training-time atomic tasks and inference-time composite tasks, which prevents VLA models from executing necessary transitional operations between atomic tasks. To address this challenge, we propose an Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments. It introduces an LLM-based agentic inference mechanism that intervenes when executing sequential manipulation tasks. By performing explicit transition inference and generating transitional robotic action code, the proposed plugin guides VLA models through missing transitional steps, enabling reliable execution of composite scientific workflows without any additional training. This inference-only intervention makes our method computationally efficient, data-efficient, and well-suited for open-ended and long-horizon robotic laboratory tasks. We build 3D assets of scientific instruments and common scientific operating scenes within an existing simulation environment. In these scenes, we have verified that our method increases the average success rate per atomic task by 42\% during inference. Furthermore, we show that our method can be easily transferred from the simulation to real scientific laboratories.
Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks
vision-language-action models
distributional mismatch
scientific experiments
robotic laboratories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic VLA
Long-horizon tasks
Transition inference
Robotic laboratories
Vision-language-action models
🔎 Similar Papers
No similar papers found.
Y
Yiwen Pang
School of Computer Science and Engineering & School of Software Engineering & School of Artificial Intelligence, Southeast University, Nanjing, China
B
Bo Zhou
School of Computer Science and Engineering & School of Software Engineering & School of Artificial Intelligence, Southeast University, Nanjing, China
C
Changjin Li
School of Computer Science and Engineering & School of Software Engineering & School of Artificial Intelligence, Southeast University, Nanjing, China
X
Xuanhao Wang
School of Computer Science and Engineering & School of Software Engineering & School of Artificial Intelligence, Southeast University, Nanjing, China
S
Shengxiang Xu
School of Computer Science and Engineering & School of Software Engineering & School of Artificial Intelligence, Southeast University, Nanjing, China
D
Deng-Bao Wang
School of Computer Science and Engineering & School of Software Engineering & School of Artificial Intelligence, Southeast University, Nanjing, China
Min-Ling Zhang
Min-Ling Zhang
Professor, School of Computer Science and Engineering, Southeast University, China
Artificial IntelligenceMachine LearningData Mining
S
Shimin Di
School of Computer Science and Engineering & School of Software Engineering & School of Artificial Intelligence, Southeast University, Nanjing, China