Sim2Real-AD: A Modular Sim-to-Real Framework for Deploying VLM-Guided Reinforcement Learning in Real-World Autonomous Driving

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This work addresses the challenge of zero-shot transfer of vision-language model (VLM)-guided reinforcement learning policies from simulation to real-world autonomous vehicles, which is hindered by mismatches in observation modalities and action semantics between simulated and physical environments. To bridge this sim-to-real gap, the authors propose a modular transfer framework comprising geometric observation bridging (GOB), physics-aware action mapping (PAM), two-phase progressive training (TPT), and a real-time deployment pipeline (RDP). This framework enables, for the first time, the zero-shot deployment of VLM-RL policies trained in CARLA onto a full-scale physical vehicle without requiring any real-world RL data, while preserving multi-task performance ranking consistency. On a Ford E-Transit vehicle, the approach achieves zero-shot success rates of 90%, 80%, and 75% on car-following, obstacle avoidance, and stop-sign interaction tasks, respectively.

Technology Category

Application Category

📝 Abstract

Deploying reinforcement learning policies trained in simulation to real autonomous vehicles remains a fundamental challenge, particularly for VLM-guided RL frameworks whose policies are typically learned with simulator-native observations and simulator-coupled action semantics that are unavailable on physical platforms. This paper presents Sim2Real-AD, a modular framework for zero-shot sim-to-real transfer of CARLA-trained VLM-guided RL policies to full-scale vehicles without any real-world RL training data. The framework decomposes the transfer problem into four components: a Geometric Observation Bridge (GOB) that converts monocular front-view images into simulator-compatible bird's-eye-view (BEV) observations, a Physics-Aware Action Mapping (PAM) that translates policy outputs into platform-agnostic physical commands, a Two-Phase Progressive Training (TPT) strategy that stabilizes adaptation by separating action-space and observation-space transfer, and a Real-time Deployment Pipeline (RDP) that integrates perception, policy inference, control conversion, and safety monitoring for closed-loop execution. Simulation experiments show that the framework preserves the relative performance ordering of representative RL algorithms across different reward paradigms and validate the contribution of each module. Zero-shot deployment on a full-scale Ford E-Transit achieves success rates of 90%, 80%, and 75% in car-following, obstacle avoidance, and stop-sign interaction scenarios, respectively. To the best of our knowledge, this study is among the first to demonstrate zero-shot closed-loop deployment of a CARLA-trained VLM-guided RL policy on a full-scale real vehicle without any real-world RL training data. The demo video and code are available at: https://zilin-huang.github.io/Sim2Real-AD-website/.

Problem

Research questions and friction points this paper is trying to address.

Sim-to-Real Transfer

VLM-guided Reinforcement Learning

Autonomous Driving

Zero-shot Deployment

Real-world Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sim-to-Real Transfer

VLM-guided Reinforcement Learning

Modular Framework