🤖 AI Summary
This work addresses the challenge of stable control in multi-fingered dexterous hands during contact-rich manipulation, where performance is highly sensitive to dynamic multi-point contact states that vary with object geometry and friction. The authors propose a vision–tactile fused generative control strategy that, for the first time, leverages tactile signals to explicitly model contact state evolution. By employing a conditional diffusion model in a joint tactile–state latent space, the method predicts coupled trajectories and introduces a contact-consistency mapping to translate them into executable targets for compliant controllers. Moving beyond conventional approaches that treat tactile feedback merely as observational input, this framework demonstrates significantly superior performance over existing vision- or vision–tactile baselines on both a real four-finger Allegro V5 hand and a simulated five-finger Tesollo DG-5F hand, achieving high-precision contact control.
📝 Abstract
Contact-rich dexterous manipulation with multi-finger hands remains an open challenge in robotics because task success depends on multi-point contacts that continuously evolve and are highly sensitive to object geometry, frictional transitions, and slip. Recently, tactile-informed manipulation policies have shown promise. However, most use tactile signals as additional observations rather than modeling contact state or how their action outputs interact with low-level controller dynamics. We present Contact-Grounded Policy (CGP), a visuotactile policy that grounds multi-point contacts by predicting coupled trajectories of actual robot state and tactile feedback, and using a learned contact-consistency mapping to convert these predictions into executable target robot states for a compliance controller. CGP consists of two components: (i) a conditional diffusion model that forecasts future robot state and tactile feedback in a compressed latent space, and (ii) a learned contact-consistency mapping that converts the predicted robot state-tactile pair into executable targets for a compliance controller, enabling it to realize the intended contacts. We evaluate CGP using a physical four-finger Allegro V5 hand with Digit360 fingertip tactile sensors, and a simulated five-finger Tesollo DG-5F hand with dense whole-hand tactile arrays. Across a range of dexterous tasks including in-hand manipulation, delicate grasping, and tool use, CGP outperforms visuomotor and visuotactile diffusion-policy baselines.