Sample-efficient Imitative Multi-token Decision Transformer for Real-world Driving

📅 2024-06-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
Offline imitation learning (IL) faces key challenges in transferring to real-world closed-loop driving, including distributional shift, low sample efficiency, and difficulty in implicit world modeling. To address these, this paper proposes an online imitation learning framework based on the Decision Transformer. We introduce a novel multi-token joint prediction architecture that unifies open-loop trajectory planning and closed-loop control within a single model. Additionally, we incorporate online policy fine-tuning and prioritized experience replay to mitigate distributional shift and improve data utilization efficiency. Evaluated on the Waymax benchmark, our method reduces collision rate by 41% and increases goal-reaching rate by 18% compared to state-of-the-art IL and offline RL approaches. To our knowledge, this is the first work to jointly optimize both high-fidelity open-loop sequence prediction and responsive closed-loop adaptation in autonomous driving decision-making.

Technology Category

Application Category

📝 Abstract
Recent advancements in autonomous driving technologies involve the capability to effectively process and learn from extensive real-world driving data. Current imitation learning and offline reinforcement learning methods have shown remarkable promise in autonomous systems, harnessing the power of offline datasets to make informed decisions in open-loop (non-reactive agents) settings. However, learning-based agents face significant challenges when transferring knowledge from open-loop to closed-loop (reactive agents) environment. The performance is significantly impacted by data distribution shift, sample efficiency, the complexity of uncovering hidden world models and physics. To address these issues, we propose Sample-efficient Imitative Multi-token Decision Transformer (SimDT). SimDT introduces multi-token prediction, online imitative learning pipeline and prioritized experience replay to sequence-modelling reinforcement learning. The performance is evaluated through empirical experiments and results exceed popular imitation and reinforcement learning algorithms both in open-loop and closed-loop settings on Waymax benchmark. SimDT exhibits 41% reduction in collision rate and 18% improvement in reaching the destination compared with the baseline method.
Problem

Research questions and friction points this paper is trying to address.

Resolving conflicting objectives in imitation and reinforcement learning
Improving sample efficiency in dynamic closed-loop environments
Uncovering hidden world models and physics in autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-informed Imitative Reinforcement Learning approach
Joint optimization of expert and exploratory data
Naturally emerging vehicle dynamics from training
🔎 Similar Papers
No similar papers found.