UniVTAC: A Unified Simulation Platform for Visuo-Tactile Manipulation Data Generation, Learning, and Benchmarking

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Contact-rich manipulation tasks are challenging to perform robustly using vision alone, yet the development of tactile-driven policies is hindered by the high cost of real-world tactile data collection and the absence of a unified evaluation platform. To address this, this work proposes UniVTAC, the first unified vision–tactile simulation platform that supports diverse tactile sensors, enables scalable generation of controllable multimodal interaction data, and integrates dedicated encoders alongside a benchmark suite comprising eight representative tasks. Evaluated on the UniVTAC Benchmark, the proposed approach achieves an average task success rate improvement of 17.1% and demonstrates a 25% performance gain in real-robot experiments, thereby advancing standardized research on tactile perception for robotic manipulation.

Technology Category

Application Category

📝 Abstract
Robotic manipulation has seen rapid progress with vision-language-action (VLA) policies. However, visuo-tactile perception is critical for contact-rich manipulation, as tasks such as insertion are difficult to complete robustly using vision alone. At the same time, acquiring large-scale and reliable tactile data in the physical world remains costly and challenging, and the lack of a unified evaluation platform further limits policy learning and systematic analysis. To address these challenges, we propose UniVTAC, a simulation-based visuo-tactile data synthesis platform that supports three commonly used visuo-tactile sensors and enables scalable and controllable generation of informative contact interactions. Based on this platform, we introduce the UniVTAC Encoder, a visuo-tactile encoder trained on large-scale simulation-synthesized data with designed supervisory signals, providing tactile-centric visuo-tactile representations for downstream manipulation tasks. In addition, we present the UniVTAC Benchmark, which consists of eight representative visuo-tactile manipulation tasks for evaluating tactile-driven policies. Experimental results show that integrating the UniVTAC Encoder improves average success rates by 17.1% on the UniVTAC Benchmark, while real-world robotic experiments further demonstrate a 25% improvement in task success. Our webpage is available at https://univtac.github.io/.
Problem

Research questions and friction points this paper is trying to address.

visuo-tactile manipulation
tactile data acquisition
robotic manipulation
simulation platform
benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

visuo-tactile manipulation
simulation-based data synthesis
tactile-centric representation
unified benchmark
multimodal encoder
🔎 Similar Papers
No similar papers found.
Baijun Chen
Baijun Chen
Nanjing University
Artificial Intelligence
W
Weijie Wan
Shenzhen University
T
Tianxing Chen
The University of Hong Kong
Xianda Guo
Xianda Guo
PhD Student at Wuhan University
Stereo Matching, Depth Estimation,Gait Recognition
Congsheng Xu
Congsheng Xu
Undergraduate, SEIEE, Shanghai Jiao Tong University
Human Motion Generation Digital Twin Embodied AI
Y
Yuanyang Qi
ViTai Robotics
H
Haojie Zhang
ViTai Robotics
L
Longyan Wu
Fudan University
T
Tianling Xu
ScaleLab, Shanghai Jiao Tong University
Zixuan Li
Zixuan Li
Assistant Professor at ICT, UCAS
Knowledge GraphLarge Language Model
Y
Yizhe Wu
ViTai Robotics
R
Rui Li
ViTai Robotics
X
Xiaokang Yang
ScaleLab, Shanghai Jiao Tong University
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing
Wei Sui
Wei Sui
Horizon Robotics
3D VisionBev Perception3D Reconstruction
Y
Yao Mu
ScaleLab, Shanghai Jiao Tong University