VILAS: A VLA-Integrated Low-cost Architecture with Soft Grasping for Robotic Manipulation

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenge of deploying secure and efficient end-to-end vision–language–action (VLA) robotic manipulation on low-cost hardware, with a focus on delicate object grasping. We present a modular, integrated platform combining a collaborative robot arm (Fairino FR5), a stereo vision system, a sensorless kirigami-inspired soft gripper, and a unified ZeroMQ communication framework, enabling training and deployment of state-of-the-art VLA models such as pi_0 and GR00T N1.6. The design achieves gentle grasping without force sensors and demonstrates effective performance in a grape-picking task, establishing for the first time that high-performance VLA policies can be efficiently executed on affordable hardware. This provides a practical paradigm and critical insights for real-world deployment of VLA systems.

📝 Abstract

We present VILAS, a fully low-cost, modular robotic manipulation platform designed to support end-to-end vision-language-action (VLA) policy learning and deployment on accessible hardware. The system integrates a Fairino FR5 collaborative arm, a Jodell RG52-50 electric gripper, and a dual-camera perception module, unified through a ZMQ-based communication architecture that seamlessly coordinates teleoperation, data collection, and policy deployment within a single framework. To enable safe manipulation of fragile objects without relying on explicit force sensing, we design a kirigami-based soft compliant gripper extension that induces predictable deformation under compressive loading, providing gentle and repeatable contact with delicate targets. We deploy and evaluate three state-of-the-art VLA models on the VILAS platform: pi_0, pi_0.5, and GR00T N1.6. All models are fine-tuned from publicly released pretrained checkpoints using an identical demonstration dataset collected via our teleoperation pipeline. Experiments on a grape grasping task validate the effectiveness of the proposed system, confirming that capable manipulation policies can be successfully trained and deployed on low-cost modular hardware. Our results further provide practical insights into the deployment characteristics of current VLA models in real-world settings.

Problem

Research questions and friction points this paper is trying to address.

robotic manipulation

vision-language-action

low-cost robotics

soft grasping

fragile objects

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language-action (VLA)

soft compliant gripper

low-cost robotic platform