🤖 AI Summary
This work addresses the challenges of training instability and unsafe action outputs in reinforcement learning for high-speed autonomous racing, a domain characterized by high dynamics and strong nonlinearities. The authors propose TraD-RL, a method that integrates expert trajectory-guided state representation and reward shaping to explicitly embed vehicle dynamics priors, while employing control barrier functions to construct a safety envelope for admissible actions. Coupled with a multi-stage curriculum learning strategy, the model transitions smoothly from expert-guided initialization to autonomous exploration. Evaluated in a high-fidelity simulation environment on the Templehof Airport racetrack, the approach significantly improves lap time and driving stability, ensuring safety while surpassing expert-level performance and achieving a synergistic optimization of racing efficiency and safety constraints.
📝 Abstract
Reinforcement learning has demonstrated significant potential in the field of autonomous driving. However, it suffers from defects such as training instability and unsafe action outputs when faced with autonomous racing environments characterized by high dynamics and strong nonlinearities. To this end, this paper proposes a trajectory guidance and dynamics constraints Reinforcement Learning (TraD-RL) method for autonomous racing. The key features of this method are as follows: 1) leveraging the prior expert racing line to construct an augmented state representation and facilitate reward shaping, thereby integrating domain knowledge to stabilize early-stage policy learning; 2) embedding explicit vehicle dynamic priors into a safe operating envelope formulated via control barrier functions to enable safety-constrained learning; and 3) adopting a multi-stage curriculum learning strategy that shifts from expert-guided learning to autonomous exploration, allowing the learned policy to surpass expert-level performance. The proposed method is evaluated in a high-fidelity simulation environment modeled after the Tempelhof Airport Street Circuit. Experimental results demonstrate that TraD-RL effectively improves both lap speed and driving stability of the autonomous racing vehicle, achieving a synergistic optimization of racing performance and safety.