Sci-VLA: Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited compositional generalization of current vision–language–action (VLA) models when executing long-horizon scientific experiments composed of multiple atomic tasks, particularly their inability to perform transitional operations between unseen task sequences. To overcome this, we propose a plug-in agent based on a large language model (LLM) that dynamically generates transition-action code at inference time—without any additional training—to guide the VLA model through novel experimental procedures. Our approach introduces, for the first time, an inference-time agent intervention mechanism that substantially enhances compositional generalization. Evaluated in a newly constructed scientific experiment simulation environment, the method improves the average success rate of atomic tasks by 42% and achieves zero-shot transfer from simulation to a real-world laboratory setting.

Technology Category

Application Category

📝 Abstract

Robotic laboratories play a critical role in autonomous scientific discovery by enabling scalable, continuous experimental execution. Recent vision-language-action (VLA) models offer a promising foundation for robotic laboratories. However, scientific experiments typically involve long-horizon tasks composed of multiple atomic tasks, posing a fundamental challenge to existing VLA models. While VLA models fine-tuned for scientific tasks can reliably execute atomic experimental actions seen during training, they often fail to perform composite tasks formed by reordering and composing these known atomic actions. This limitation arises from a distributional mismatch between training-time atomic tasks and inference-time composite tasks, which prevents VLA models from executing necessary transitional operations between atomic tasks. To address this challenge, we propose an Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments. It introduces an LLM-based agentic inference mechanism that intervenes when executing sequential manipulation tasks. By performing explicit transition inference and generating transitional robotic action code, the proposed plugin guides VLA models through missing transitional steps, enabling reliable execution of composite scientific workflows without any additional training. This inference-only intervention makes our method computationally efficient, data-efficient, and well-suited for open-ended and long-horizon robotic laboratory tasks. We build 3D assets of scientific instruments and common scientific operating scenes within an existing simulation environment. In these scenes, we have verified that our method increases the average success rate per atomic task by 42\% during inference. Furthermore, we show that our method can be easily transferred from the simulation to real scientific laboratories.

Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks

vision-language-action models

distributional mismatch

scientific experiments

robotic laboratories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic VLA

Long-horizon tasks

Transition inference

Robotic laboratories

Vision-language-action models

🔎 Similar Papers

No similar papers found.

Authors to Follow