🤖 AI Summary
To address challenges in spatiotemporal reasoning and uncertainty modeling for dense, complex driving scenarios, this paper proposes the Uncertainty-Weighted Decision Transformer (UWDT). UWDT takes multi-channel bird’s-eye-view occupancy grids as input and employs a long-sequence Transformer to jointly model spatiotemporal dependencies. Crucially, it introduces a frozen teacher model to estimate token-level prediction entropy, enabling dynamic, entropy-weighted loss computation for the student model—thereby enhancing learning from rare, high-risk states. Evaluated in multi-density roundabout simulations, UWDT significantly reduces collision rates while improving cumulative reward and behavioral stability, demonstrating robust end-to-end decision-making under high-dynamic traffic conditions. The core contribution is the first integration of token-level uncertainty-aware weighting into the Decision Transformer framework, achieving data-efficient, risk-sensitive navigation learning.
📝 Abstract
Autonomous driving in dense, dynamic environments requires decision-making systems that can exploit both spatial structure and long-horizon temporal dependencies while remaining robust to uncertainty. This work presents a novel framework that integrates multi-channel bird's-eye-view occupancy grids with transformer-based sequence modeling for tactical driving in complex roundabout scenarios. To address the imbalance between frequent low-risk states and rare safety-critical decisions, we propose the Uncertainty-Weighted Decision Transformer (UWDT). UWDT employs a frozen teacher transformer to estimate per-token predictive entropy, which is then used as a weight in the student model's loss function. This mechanism amplifies learning from uncertain, high-impact states while maintaining stability across common low-risk transitions. Experiments in a roundabout simulator, across varying traffic densities, show that UWDT consistently outperforms other baselines in terms of reward, collision rate, and behavioral stability. The results demonstrate that uncertainty-aware, spatial-temporal transformers can deliver safer and more efficient decision-making for autonomous driving in complex traffic environments.