🤖 AI Summary
Current robot learning lacks large-scale, multimodal interaction datasets for deformable soft objects under diverse contact pressures; existing datasets predominantly focus on rigid objects, limiting modeling of rich, real-world tactile interactions. Method: We introduce the first humanoid vision–tactile–action teleoperation dataset specifically designed for compliant objects. Built on a dexterous humanoid platform, it synchronously captures high-resolution visual data, high-fidelity tactile signals—including distributed force and deformation—and fine-grained motor trajectories across varying contact pressures. Contribution/Results: The dataset is large-scale, scene-diverse, and contact-intensive, uniquely characterizing multimodal soft-object responses under dynamic pressure. It significantly enhances tactile signal modeling capability. Experiments demonstrate its effectiveness in enabling tactile-driven closed-loop manipulation and cross-task generalization, establishing a foundational resource for joint perception–decision–control modeling in soft-object manipulation.
📝 Abstract
Contact-rich manipulation has become increasingly important in robot learning. However, previous studies on robot learning datasets have focused on rigid objects and underrepresented the diversity of pressure conditions for real-world manipulation. To address this gap, we present a humanoid visual-tactile-action dataset designed for manipulating deformable soft objects. The dataset was collected via teleoperation using a humanoid robot equipped with dexterous hands, capturing multi-modal interactions under varying pressure conditions. This work also motivates future research on models with advanced optimization strategies capable of effectively leveraging the complexity and diversity of tactile signals.