ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Robust evaluation of robotic dexterous manipulation under weak visual conditions—such as occlusion, low illumination, and material sensitivity—lacks standardized, reproducible benchmarks. Method: We introduce ManiFeel, the first reproducible and scalable vision–tactile fusion simulation benchmark, built on PyBullet with high-fidelity physics. It supports standardized horizontal evaluation across 8 manipulation tasks, 5 tactile modalities, and 12 policy variants. We systematically compare tactile representations—including raw signals, event-driven encodings, and learned embeddings—and propose CNN+RNN/Transformer multimodal architectures with a unified evaluation protocol. Results: Experiments show tactile feedback improves task success rates by 23.6% on average in critical weak-visual scenarios; analysis reveals tactile representation fidelity and vision–tactile alignment as key performance bottlenecks. All data, training logs, and pretrained models are open-sourced, filling a critical gap in supervised visuotactile policy learning benchmarks.

Technology Category

Application Category

📝 Abstract

Supervised visuomotor policies have shown strong performance in robotic manipulation but often struggle in tasks with limited visual input, such as operations in confined spaces, dimly lit environments, or scenarios where perceiving the object's properties and state is critical for task success. In such cases, tactile feedback becomes essential for manipulation. While the rapid progress of supervised visuomotor policies has benefited greatly from high-quality, reproducible simulation benchmarks in visual imitation, the visuotactile domain still lacks a similarly comprehensive and reliable benchmark for large-scale and rigorous evaluation. To address this, we introduce ManiFeel, a reproducible and scalable simulation benchmark for studying supervised visuotactile manipulation policies across a diverse set of tasks and scenarios. ManiFeel presents a comprehensive benchmark suite spanning a diverse set of manipulation tasks, evaluating various policies, input modalities, and tactile representation methods. Through extensive experiments, our analysis reveals key factors that influence supervised visuotactile policy learning, identifies the types of tasks where tactile sensing is most beneficial, and highlights promising directions for future research in visuotactile policy learning. ManiFeel aims to establish a reproducible benchmark for supervised visuotactile policy learning, supporting progress in visuotactile manipulation and perception. To facilitate future research and ensure reproducibility, we will release our codebase, datasets, training logs, and pretrained checkpoints. Please visit the project website for more details: https://zhengtongxu.github.io/manifeel-website/

Problem

Research questions and friction points this paper is trying to address.

Developing benchmarks for visuotactile manipulation policy learning

Addressing limitations of visual input in confined or dim environments

Evaluating tactile feedback's role in diverse manipulation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ManiFeel benchmark for visuotactile policies

Evaluates diverse tasks and tactile representations

Provides code datasets and pretrained models

🔎 Similar Papers

MimicTouch: Leveraging Multi-modal Human Tactile Demonstrations for Contact-rich Manipulation