PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robot policy generalization under few-shot demonstrations remains limited when initial robot and object poses vary; real-robot training is unsafe, and high-fidelity simulation construction is costly. Method: This paper proposes an end-to-end, real-scene-driven sim-to-real reinforcement learning closed-loop framework. It introduces a vision-language model (VLM)-guided projection-relation reward modeling mechanism that enables single-frame image-based object recognition, 6D pose estimation, and 3D model retrieval, automatically generating physically consistent simulation scenes. Policy training follows a two-stage paradigm: VLM-supervised reward pretraining followed by expert demonstration fine-tuning. Contribution/Results: On real robotic arms, the method achieves robust cross-pose deployment with only 3–5 demonstrations. Sim-to-real transfer success rate improves by 42%, and scene construction time decreases by 90%.

Technology Category

Application Category

📝 Abstract
Learning from few demonstrations to develop policies robust to variations in robot initial positions and object poses is a problem of significant practical interest in robotics. Compared to imitation learning, which often struggles to generalize from limited samples, reinforcement learning (RL) can autonomously explore to obtain robust behaviors. Training RL agents through direct interaction with the real world is often impractical and unsafe, while building simulation environments requires extensive manual effort, such as designing scenes and crafting task-specific reward functions. To address these challenges, we propose an integrated real-to-sim-to-real pipeline that constructs simulation environments based on expert demonstrations by identifying scene objects from images and retrieving their corresponding 3D models from existing libraries. We introduce a projection-based reward model for RL policy training that is supervised by a vision-language model (VLM) using human-guided object projection relationships as prompts, with the policy further fine-tuned using expert demonstrations. In general, our work focuses on the construction of simulation environments and RL-based policy training, ultimately enabling the deployment of reliable robotic control policies in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Learning robust policies from few demonstrations
Building simulation environments without manual effort
Training RL agents safely and efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated real-to-sim-to-real pipeline for environment construction
Projection-based reward model supervised by vision-language model
Policy fine-tuning using expert demonstrations for robustness
🔎 Similar Papers
No similar papers found.
Haowen Sun
Haowen Sun
Department of Automation, Tsinghua University
Computer Vision
H
Han Wang
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049
Chengzhong Ma
Chengzhong Ma
Unknown affiliation
S
Shaolong Zhang
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049
J
Jiawei Ye
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049
X
Xingyu Chen
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049
X
Xuguang Lan
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049