End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of reinforcement learning policies when transferring from simulation to real-world robotic systems, particularly in contact-rich tasks such as cutting unknown materials, where domain discrepancies and scarce real-world data—especially the absence of ground-truth reward signals—pose significant challenges. To overcome this, the authors propose an end-to-end, example-driven approach that reinterprets neural style transfer as temporal trajectory stylization. By integrating variational autoencoders with self-supervised representation learning, the method leverages unpaired and unlabeled real-world data to generate physically plausible, weakly aligned trajectories, enabling policy adaptation without requiring real reward signals. Experiments demonstrate that the approach substantially outperforms baselines such as CycleGAN and conditional variational autoencoders across diverse geometries and materials, achieving high task success rates and behavioral stability with only minimal real-world data.

Technology Category

Application Category

📝 Abstract
Whereas reinforcement learning has been applied with success to a range of robotic control problems in complex, uncertain environments, reliance on extensive data - typically sourced from simulation environments - limits real-world deployment due to the domain gap between simulated and physical systems, coupled with limited real-world sample availability. We propose a novel method for sim-to-real transfer of reinforcement learning policies, based on a reinterpretation of neural style transfer from image processing to synthesise novel training data from unpaired unlabelled real world datasets. We employ a variational autoencoder to jointly learn self-supervised feature representations for style transfer and generate weakly paired source-target trajectories to improve physical realism of synthesised trajectories. We demonstrate the application of our approach based on the case study of robot cutting of unknown materials. Compared to baseline methods, including our previous work, CycleGAN, and conditional variational autoencoder-based time series translation, our approach achieves improved task completion time and behavioural stability with minimal real-world data. Our framework demonstrates robustness to geometric and material variation, and highlights the feasibility of policy adaptation in challenging contact-rich tasks where real-world reward information is unavailable.
Problem

Research questions and friction points this paper is trying to address.

sim-to-real transfer
reinforcement learning
domain gap
robotic cutting
real-world deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

neural style transfer
sim-to-real transfer
variational autoencoder
reinforcement learning
robotic cutting
🔎 Similar Papers
No similar papers found.
J
Jamie Hathaway
1Extreme Robotics Lab, School of Metallurgy and Materials, University of Birmingham, Birmingham, B15 2TT, UK; 2The Faraday Institution, Quad One, Harwell Science and Innovation Campus, Didcot, OX11 0RA, UK
Alireza Rastegarpanah
Alireza Rastegarpanah
Co-founder of Extreme Robotics Lab, University of Birmingham
Robotic DisassemblyAI-driven RoboticsRobotic RemanufacturingMedical Robotics
Rustam Stolkin
Rustam Stolkin
Chair of Robotics, UoB. Royal Society Industry Fellow. Director A.R.M Robotics Ltd.
RoboticsAIComputer VisionManipulationHuman-Robot Interaction