Concurrent Prehensile and Nonprehensile Manipulation: A Practical Approach to Multi-Stage Dexterous Tasks

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses the challenge of performing long-horizon, multi-stage manipulation tasks with dexterous hands, which require alternating between grasping and non-grasping actions under complex contact dynamics and scarce demonstration data. The authors propose DexMulti, a modular framework that decomposes tasks into object-centric skill units with well-defined temporal boundaries. By integrating object geometry–driven skill retrieval, uncertainty-aware estimation of object centroid and yaw angle, and a modular execution mechanism, DexMulti substantially reduces reliance on demonstrations. Using only 3–4 demonstrations per object, the method achieves an average success rate of 66% on training objects over more than 1,000 real-world trials—outperforming diffusion-policy baselines by 2–3×—and demonstrates robust generalization to unseen objects and spatial displacements of up to ±25 cm.

Technology Category

Application Category

📝 Abstract

Dexterous hands enable concurrent prehensile and nonprehensile manipulation, such as holding one object while interacting with another, a capability essential for everyday tasks yet underexplored in robotics. Learning such long-horizon, contact-rich multi-stage behaviors is challenging because demonstrations are expensive to collect and end-to-end policies require substantial data to generalize across varied object geometries and placements. We present DexMulti, a sample-efficient approach for real-world dexterous multi-task manipulation that decomposes demonstrations into object-centric skills with well-defined temporal boundaries. Rather than learning monolithic policies, our method retrieves demonstrated skills based on current object geometry, aligns them to the observed object state using an uncertainty-aware estimator that tracks centroid and yaw, and executes them via a retrieve-align-execute paradigm. We evaluate on three multi-stage tasks requiring concurrent manipulation (Grasp + Pull, Grasp + Open, and Grasp + Grasp) across two dexterous hands (Allegro and LEAP) in over 1,000 real-world trials. Our approach achieves an average success rate of 66% on training objects with only 3-4 demonstrations per object, outperforming diffusion policy baselines by 2-3x while requiring far fewer demonstrations. Results demonstrate robust generalization to held-out objects and spatial variations up to +/-25 cm.

Problem

Research questions and friction points this paper is trying to address.

dexterous manipulation

concurrent prehensile and nonprehensile manipulation

multi-stage tasks

sample-efficient learning

contact-rich manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dexterous manipulation

skill decomposition

retrieve-align-execute