SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of combinatorial explosion in bimanual manipulation, where naively combining single-arm skills leads to poor generalization and redundant learning. To overcome this, we propose the first bimanual vision-language-action (VLA) model that explicitly models skill reuse by decoupling the representations of left- and right-arm skills, enabling efficient recombination of learned single-arm skills in novel bimanual configurations. Our approach employs a modular architecture that supports joint vision-language-action modeling and achieves substantial improvements in task success without retraining—raising performance from 0% to 51% on compositional tasks—while demonstrating strong generalization in collaborative and long-horizon scenarios.

Technology Category

Application Category

📝 Abstract
Recent progress in vision-language-action (VLA) models has demonstrated strong potential for dual-arm manipulation, enabling complex behaviors and generalization to unseen environments. However, mainstream bimanual VLA formulations largely overlook the critical challenge of combinatorial diversity. Different pairings of single-arm behaviors can induce qualitatively distinct task behaviors, yet existing models do not explicitly account for this structure. We argue that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination. Current VLA designs entangle skills across arms, preventing such recomposition and limiting scalability. To address this limitation, we propose SkillVLA, a framework explicitly designed to enable skill reuse in dual-arm manipulation. Extensive experiments demonstrate that SkillVLA substantially improves skill composition, increasing overall success rate from 0% to 51%, and achieves strong performance on cooperative and long-horizon tasks.
Problem

Research questions and friction points this paper is trying to address.

combinatorial diversity
dual-arm manipulation
skill reuse
vision-language-action models
bimanual coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

skill reuse
combinatorial diversity
dual-arm manipulation
vision-language-action models
skill composition
🔎 Similar Papers
No similar papers found.
X
Xuanran Zhai
National University of Singapore
Z
Zekai Huang
National University of Singapore
L
Longyan Wu
Shanghai Innovation Institute, Fudan University
Q
Qianyou Zhao
Shanghai Jiao Tong University
Qiaojun Yu
Qiaojun Yu
Shanghai Jiao Tong University, Shanghai AI Lab
robotic learning3D visionvla
J
Jieji Ren
Shanghai Jiao Tong University
Ce Hao
Ce Hao
National University of Singapore
Harold Soh
Harold Soh
Associate Professor at National University of Singapore
Human Robot InteractionMachine LearningTactile PerceptionArtificial IntelligenceRobotics