🤖 AI Summary
Offline imitation learning (IL) faces key challenges in transferring to real-world closed-loop driving, including distributional shift, low sample efficiency, and difficulty in implicit world modeling. To address these, this paper proposes an online imitation learning framework based on the Decision Transformer. We introduce a novel multi-token joint prediction architecture that unifies open-loop trajectory planning and closed-loop control within a single model. Additionally, we incorporate online policy fine-tuning and prioritized experience replay to mitigate distributional shift and improve data utilization efficiency. Evaluated on the Waymax benchmark, our method reduces collision rate by 41% and increases goal-reaching rate by 18% compared to state-of-the-art IL and offline RL approaches. To our knowledge, this is the first work to jointly optimize both high-fidelity open-loop sequence prediction and responsive closed-loop adaptation in autonomous driving decision-making.
📝 Abstract
Recent advancements in autonomous driving technologies involve the capability to effectively process and learn from extensive real-world driving data. Current imitation learning and offline reinforcement learning methods have shown remarkable promise in autonomous systems, harnessing the power of offline datasets to make informed decisions in open-loop (non-reactive agents) settings. However, learning-based agents face significant challenges when transferring knowledge from open-loop to closed-loop (reactive agents) environment. The performance is significantly impacted by data distribution shift, sample efficiency, the complexity of uncovering hidden world models and physics. To address these issues, we propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT). SimDT introduces multi-token prediction, online imitative learning pipeline and prioritized experience replay to sequence-modelling reinforcement learning. The performance is evaluated through empirical experiments and results exceed popular imitation and reinforcement learning algorithms both in open-loop and closed-loop settings on Waymax benchmark. SimDT exhibits 41% reduction in collision rate and 18% improvement in reaching the destination compared with the baseline method.