Learning Robust Dexterous In-Hand Manipulation from Joint Sensors with Proprioceptive Transformer

๐Ÿ“… 2026-05-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

205K/year
๐Ÿค– AI Summary
This work addresses the underutilization of readily available joint-sensor information in tendon-driven dexterous hand manipulation, which typically relies heavily on external perception. The authors propose a vision- and touch-free approach that achieves continuous in-hand object rotation using only temporal sequences of joint positions and velocities. A teacher policy trained via reinforcement learning is distilled into a Proprioceptive Transformer model that operates exclusively on proprioceptive inputs. This method implicitly infers object states from joint data alone and, for the first time, demonstrates high-performance manipulation on the physical ORCA hand using only joint sensing. Experiments show a 3.1ร— increase in rotation speed over baseline methods and a 23.4% reduction in RMSE for cube position estimation.
๐Ÿ“ Abstract
In-hand object manipulation is a fundamental yet challenging capability for dexterous robots. Despite significant progress in dexterous manipulation, existing approaches rely heavily on vision or tactile sensing to track object states, while joint sensing -- the most readily available modality on any robotic hand -- remains largely overlooked, particularly for tendon-driven hands. In this paper, we study how far joint sensing alone can go by asking: (i) whether motor encoders or direct joint sensing provides better proprioceptive feedback, (ii) how to extract environment information from joint measurements, and (iii) whether joint-only control can achieve competitive real-world performance without external perception. We present the Proprioceptive Transformer (PT), an exteroceptive-free approach for continuous cube rotation on a tendon-driven dexterous hand that uses only joint sensing feedback. A teacher policy is first trained via reinforcement learning with privileged object information, then distilled into PT, which operates solely on joint position and velocity histories. The Transformer architecture effectively extracts implicit object state information from temporal patterns in joint sensor readings. Experiments on the real ORCA hand show that our approach achieves 3.1x higher rotation speed than baselines. We also demonstrate that our PT achieves a 23.4% lower RMSE for cube position estimation than the MLP baseline, indicating superior extraction of exteroceptive information from proprioceptive sources.
Problem

Research questions and friction points this paper is trying to address.

in-hand manipulation
proprioceptive sensing
joint sensing
dexterous robot
exteroceptive-free control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proprioceptive Transformer
joint sensing
in-hand manipulation
tendon-driven hand
sensorimotor policy distillation
๐Ÿ”Ž Similar Papers