VE2VF: Vision-Enabled to Vision-Free Distillation via Real-world Reinforcement Learning for Robust Contact-Rich Manipulation

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of vision-augmented reinforcement learning policies in contact-rich robotic manipulation, which often overfit to training visual conditions. To overcome this, the authors propose a human-in-the-loop teacher–student distillation framework that transfers knowledge from a vision-dependent teacher policy to a vision-free student policy relying solely on pose, angular velocity, and force/torque sensing. This approach enables efficient training and strong generalization in real-world settings without requiring domain randomization or data augmentation. Evaluated on the NIST assembly benchmark, the method achieves a 95% success rate across three tasks after only approximately 50 minutes of training and successfully generalizes to eight unseen task variants. With minimal fine-tuning, it attains a 100% success rate on the most challenging task, significantly outperforming baseline methods.
📝 Abstract
When using reinforcement learning (RL) for contact-rich robotic manipulation, vision can provide task-relevant information that accelerates learning beyond what proprioception alone can achieve. However, vision-enabled policies tend to overfit to the visual conditions seen during training, limiting their robustness and transferability. We present a human-in-the-loop RL framework that employs teacher-student distillation to achieve robust performance across multiple task variants, trained entirely in the real world without requiring domain randomization or data augmentation. A vision-enabled teacher distills its knowledge into a vision-free student that relies solely on pose, twist, and wrench sensing, combining fast training with strong task generalization. On the real-world NIST assembly benchmark board, our approach achieves 95\% overall success after approximately 50 minutes of training on 3 representative tasks, including robust generalization to 8 unseen task variants. Fine-tuning with distillation achieves full success on the most challenging task. We demonstrate that the resulting policies outperform baselines in both robustness and adaptability.
Problem

Research questions and friction points this paper is trying to address.

vision-enabled policy
overfitting
robustness
transferability
contact-rich manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-free policy
knowledge distillation
real-world reinforcement learning
contact-rich manipulation
robust generalization
🔎 Similar Papers
No similar papers found.
V
Victor Kowalski
Autonomous Systems, Technische Universitaet Wien (TU Wien), Vienna, Austria
C
Chengxi Li
Autonomous Systems, Technische Universitaet Wien (TU Wien), Vienna, Austria
Dongheui Lee
Dongheui Lee
Professor, Technische Universität Wien (TU Wien) // German Aerospace Center (DLR)
RoboticsMachine LearningHuman Robot InteractionHumanoid robots