🤖 AI Summary
Reinforcement learning (RL) faces significant challenges in real-world autonomous driving (AD) deployment due to the absence of standardized, efficient, and scalable research frameworks. Method: This paper introduces the first end-to-end, production-oriented RL framework specifically designed for AD. It innovatively integrates hardware-accelerated simulation (built on Waymax), data-driven scenario generation (ScenarioNet), adversarial evaluation, and a multi-dimensional performance metric suite. The framework employs a Transformer-based encoder, modular observation and reward function design, and a distributed training pipeline. Contribution/Results: Through large-scale benchmarking experiments, it quantitatively characterizes how network architecture, observation representation, dataset scale, and reward shaping jointly influence AD-RL policy performance. The framework establishes foundational infrastructure—enabling reproducible, rigorously evaluable, and deployable RL research for practical autonomous driving development.
📝 Abstract
Learning-based decision-making has the potential to enable generalizable Autonomous Driving (AD) policies, reducing the engineering overhead of rule-based approaches. Imitation Learning (IL) remains the dominant paradigm, benefiting from large-scale human demonstration datasets, but it suffers from inherent limitations such as distribution shift and imitation gaps. Reinforcement Learning (RL) presents a promising alternative, yet its adoption in AD remains limited due to the lack of standardized and efficient research frameworks. To this end, we introduce V-Max, an open research framework providing all the necessary tools to make RL practical for AD. V-Max is built on Waymax, a hardware-accelerated AD simulator designed for large-scale experimentation. We extend it using ScenarioNet's approach, enabling the fast simulation of diverse AD datasets. V-Max integrates a set of observation and reward functions, transformer-based encoders, and training pipelines. Additionally, it includes adversarial evaluation settings and an extensive set of evaluation metrics. Through a large-scale benchmark, we analyze how network architectures, observation functions, training data, and reward shaping impact RL performance.