SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses the challenge of combinatorial explosion in bimanual manipulation, where naively combining single-arm skills leads to poor generalization and redundant learning. To overcome this, we propose the first bimanual vision-language-action (VLA) model that explicitly models skill reuse by decoupling the representations of left- and right-arm skills, enabling efficient recombination of learned single-arm skills in novel bimanual configurations. Our approach employs a modular architecture that supports joint vision-language-action modeling and achieves substantial improvements in task success without retraining—raising performance from 0% to 51% on compositional tasks—while demonstrating strong generalization in collaborative and long-horizon scenarios.

Technology Category

Application Category

📝 Abstract
Recent progress in vision-language-action (VLA) models has demonstrated strong potential for dual-arm manipulation, enabling complex behaviors and generalization to unseen environments. However, mainstream bimanual VLA formulations largely overlook the critical challenge of combinatorial diversity. Different pairings of single-arm behaviors can induce qualitatively distinct task behaviors, yet existing models do not explicitly account for this structure. We argue that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination. Current VLA designs entangle skills across arms, preventing such recomposition and limiting scalability. To address this limitation, we propose SkillVLA, a framework explicitly designed to enable skill reuse in dual-arm manipulation. Extensive experiments demonstrate that SkillVLA substantially improves skill composition, increasing overall success rate from 0% to 51%, and achieves strong performance on cooperative and long-horizon tasks.
Problem

Research questions and friction points this paper is trying to address.

combinatorial diversity
dual-arm manipulation
skill reuse
vision-language-action models
bimanual coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

skill reuse
combinatorial diversity
dual-arm manipulation
vision-language-action models
skill composition
🔎 Similar Papers
No similar papers found.