๐ค AI Summary
This work addresses the limitation of existing tactile learning methods, which predominantly focus on object-level static properties and struggle to model fine-grained force and deformation dynamics during physical interactions. To bridge this gap, the authors introduce ToucHD, a large-scale hierarchical tactile dataset that, for the first time, encompasses atomic actions, real-world manipulations, and aligned forceโtactile pairs. They further propose the AnyTouch 2 framework, which enables universal dynamic tactile representation learning across sensors by jointly modeling multi-frame deformations and explicit force dynamics. This approach unifies object-level understanding with fine-grained force perception, achieving strong performance in static attribute recognition, dynamic physical property estimation, and multi-level manipulation tasks. The method significantly enhances model generalization and robustness across diverse optical tactile sensors and complex interaction scenarios.
๐ Abstract
Real-world contact-rich manipulation demands robots to perceive temporal tactile feedback, capture subtle surface deformations, and reason about object properties as well as force dynamics. Although optical tactile sensors are uniquely capable of providing such rich information, existing tactile datasets and models remain limited. These resources primarily focus on object-level attributes (e.g., material) while largely overlooking fine-grained tactile temporal dynamics during physical interactions. We consider that advancing dynamic tactile perception requires a systematic hierarchy of dynamic perception capabilities to guide both data collection and model design. To address the lack of tactile data with rich dynamic information, we present ToucHD, a large-scale hierarchical tactile dataset spanning tactile atomic actions, real-world manipulations, and touch-force paired data. Beyond scale, ToucHD establishes a comprehensive tactile dynamic data ecosystem that explicitly supports hierarchical perception capabilities from the data perspective. Building on it, we propose AnyTouch 2, a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. The framework captures both pixel-level and action-specific deformations across frames, while explicitly modeling physical force dynamics, thereby learning multi-level dynamic perception capabilities from the model perspective. We evaluate our model on benchmarks that covers static object properties and dynamic physical attributes, as well as real-world manipulation tasks spanning multiple tiers of dynamic perception capabilities-from basic object-level understanding to force-aware dexterous manipulation. Experimental results demonstrate consistent and strong performance across sensors and tasks.