VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language-action (VLA) systems predominantly employ parallel two-finger grippers, limiting their capability in tasks requiring large-area contact or strong adhesion—e.g., glass wiping or opening handleless drawers. To address this, we propose a low-cost, compact hybrid end-effector that innovatively integrates a mechanical gripper with a vacuum suction cup, enabling independent or synergistic switching between grasping and suction modalities. This design overcomes the inherent limitations of single-mode actuators in contact area and adhesive force. The end-effector is seamlessly embedded into mainstream VLA frameworks—including DexVLA and Pi0—to support vision-language-instruction-driven, adaptive grasp-mode selection and execution. Experimental evaluation demonstrates substantial improvements in success rates across diverse, realistic manipulation tasks, significantly enhancing operational feasibility. All hardware schematics and control software are publicly released under an open-source license.

Technology Category

Application Category

📝 Abstract
Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approaches, a majority of current VLA systems employ parallel two finger grippers as their default end effectors. However, such grippers face inherent limitations in handling certain real world tasks such as wiping glass surfaces or opening drawers without handles due to insufficient contact area or lack of adhesion. To overcome these challenges, we present a low cost, integrated hardware design that combines a mechanical two finger gripper with a vacuum suction unit, enabling dual mode manipulation within a single end effector. Our system supports flexible switching or synergistic use of both modalities, expanding the range of feasible tasks. We validate the efficiency and practicality of our design within two state of the art VLA frameworks: DexVLA and Pi0. Experimental results demonstrate that with the proposed hybrid end effector, robots can successfully perform multiple complex tasks that are infeasible for conventional two finger grippers alone. All hardware designs and controlling systems will be released.
Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of two-finger grippers in complex manipulation tasks
Combines mechanical gripper with vacuum suction for dual-mode manipulation
Enables flexible task execution in Vision Language Action robot systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines mechanical gripper with vacuum suction unit
Enables dual mode manipulation in single end effector
Supports flexible switching between gripping and suction
🔎 Similar Papers
No similar papers found.
H
Hui Zhou
The Chinese University of Hong Kong, Hong Kong SAR, China
S
Siyuan Huang
Shanghai Jiao Tong University, Shanghai, China
M
Minxing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
H
Hao Zhang
The Chinese University of Hong Kong, Hong Kong SAR, China
L
Lue Fan
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Shaoshuai Shi
Shaoshuai Shi
Didi Chuxing, Max Planck Institute for Informatics
Computer VisionDeep LearningAutonomous Driving