ManiFeel: Benchmarking and Understanding Visuotactile Manipulation Policy Learning

📅 2025-05-24
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Robust evaluation of robotic dexterous manipulation under weak visual conditions—such as occlusion, low illumination, and material sensitivity—lacks standardized, reproducible benchmarks. Method: We introduce ManiFeel, the first reproducible and scalable vision–tactile fusion simulation benchmark, built on PyBullet with high-fidelity physics. It supports standardized horizontal evaluation across 8 manipulation tasks, 5 tactile modalities, and 12 policy variants. We systematically compare tactile representations—including raw signals, event-driven encodings, and learned embeddings—and propose CNN+RNN/Transformer multimodal architectures with a unified evaluation protocol. Results: Experiments show tactile feedback improves task success rates by 23.6% on average in critical weak-visual scenarios; analysis reveals tactile representation fidelity and vision–tactile alignment as key performance bottlenecks. All data, training logs, and pretrained models are open-sourced, filling a critical gap in supervised visuotactile policy learning benchmarks.

Technology Category

Application Category

📝 Abstract
Supervised visuomotor policies have shown strong performance in robotic manipulation but often struggle in tasks with limited visual input, such as operations in confined spaces, dimly lit environments, or scenarios where perceiving the object's properties and state is critical for task success. In such cases, tactile feedback becomes essential for manipulation. While the rapid progress of supervised visuomotor policies has benefited greatly from high-quality, reproducible simulation benchmarks in visual imitation, the visuotactile domain still lacks a similarly comprehensive and reliable benchmark for large-scale and rigorous evaluation. To address this, we introduce ManiFeel, a reproducible and scalable simulation benchmark for studying supervised visuotactile manipulation policies across a diverse set of tasks and scenarios. ManiFeel presents a comprehensive benchmark suite spanning a diverse set of manipulation tasks, evaluating various policies, input modalities, and tactile representation methods. Through extensive experiments, our analysis reveals key factors that influence supervised visuotactile policy learning, identifies the types of tasks where tactile sensing is most beneficial, and highlights promising directions for future research in visuotactile policy learning. ManiFeel aims to establish a reproducible benchmark for supervised visuotactile policy learning, supporting progress in visuotactile manipulation and perception. To facilitate future research and ensure reproducibility, we will release our codebase, datasets, training logs, and pretrained checkpoints. Please visit the project website for more details: https://zhengtongxu.github.io/manifeel-website/
Problem

Research questions and friction points this paper is trying to address.

Developing benchmarks for visuotactile manipulation policy learning
Addressing limitations of visual input in confined or dim environments
Evaluating tactile feedback's role in diverse manipulation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ManiFeel benchmark for visuotactile policies
Evaluates diverse tasks and tactile representations
Provides code datasets and pretrained models
🔎 Similar Papers
No similar papers found.