VE2VF: Vision-Enabled to Vision-Free Distillation via Real-world Reinforcement Learning for Robust Contact-Rich Manipulation

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the limited generalization of vision-augmented reinforcement learning policies in contact-rich robotic manipulation, which often overfit to training visual conditions. To overcome this, the authors propose a human-in-the-loop teacher–student distillation framework that transfers knowledge from a vision-dependent teacher policy to a vision-free student policy relying solely on pose, angular velocity, and force/torque sensing. This approach enables efficient training and strong generalization in real-world settings without requiring domain randomization or data augmentation. Evaluated on the NIST assembly benchmark, the method achieves a 95% success rate across three tasks after only approximately 50 minutes of training and successfully generalizes to eight unseen task variants. With minimal fine-tuning, it attains a 100% success rate on the most challenging task, significantly outperforming baseline methods.

📝 Abstract

When using reinforcement learning (RL) for contact-rich robotic manipulation, vision can provide task-relevant information that accelerates learning beyond what proprioception alone can achieve. However, vision-enabled policies tend to overfit to the visual conditions seen during training, limiting their robustness and transferability. We present a human-in-the-loop RL framework that employs teacher-student distillation to achieve robust performance across multiple task variants, trained entirely in the real world without requiring domain randomization or data augmentation. A vision-enabled teacher distills its knowledge into a vision-free student that relies solely on pose, twist, and wrench sensing, combining fast training with strong task generalization. On the real-world NIST assembly benchmark board, our approach achieves 95\% overall success after approximately 50 minutes of training on 3 representative tasks, including robust generalization to 8 unseen task variants. Fine-tuning with distillation achieves full success on the most challenging task. We demonstrate that the resulting policies outperform baselines in both robustness and adaptability.

Problem

Research questions and friction points this paper is trying to address.

vision-enabled policy

overfitting

robustness

transferability

contact-rich manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-free policy

knowledge distillation

real-world reinforcement learning

contact-rich manipulation

robust generalization

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Robotic Control Policy (PhD)