Enhancing End-to-End Autonomous Driving with Latent World Model

📅 2024-06-12
🏛️ arXiv.org
📈 Citations: 6
Influential: 1
📄 PDF
🤖 AI Summary
To address insufficient scene representation in end-to-end autonomous driving, this paper proposes a self-supervised learning framework based on a Latent World Model (LAW), the first to incorporate latent world modeling into end-to-end driving systems. LAW jointly predicts future scene features and ego-vehicle trajectories, enabling perception-planning co-optimization without explicit annotations, while remaining compatible with mainstream backbones such as BEVFormer and TransFuser. It supports both perception-agnostic and perception-driven paradigms, offering strong generalization and temporal modeling capabilities. Evaluated on three major benchmarks—nuScenes (open-loop), NAVSIM, and CARLA (closed-loop)—LAW achieves state-of-the-art performance, significantly improving trajectory prediction accuracy and closed-loop driving success rate.

Technology Category

Application Category

📝 Abstract
In autonomous driving, end-to-end planners directly utilize raw sensor data, enabling them to extract richer scene features and reduce information loss compared to traditional planners. This raises a crucial research question: how can we develop better scene feature representations to fully leverage sensor data in end-to-end driving? Self-supervised learning methods show great success in learning rich feature representations in NLP and computer vision. Inspired by this, we propose a novel self-supervised learning approach using the LAtent World model (LAW) for end-to-end driving. LAW predicts future scene features based on current features and ego trajectories. This self-supervised task can be seamlessly integrated into perception-free and perception-based frameworks, improving scene feature learning and optimizing trajectory prediction. LAW achieves state-of-the-art performance across multiple benchmarks, including real-world open-loop benchmark nuScenes, NAVSIM, and simulator-based closed-loop benchmark CARLA. The code is released at https://github.com/BraveGroup/LAW.
Problem

Research questions and friction points this paper is trying to address.

Improving scene feature representations for end-to-end autonomous driving
Developing self-supervised learning methods for better sensor data utilization
Enhancing trajectory prediction through latent world model integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning for scene feature representation
Latent World model predicts future scene features
Integration into perception-free and perception-based frameworks
Yingyan Li
Yingyan Li
Institute of Automation, Chinese Academy of Sciences
computer vision
L
Lue Fan
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
J
Jiawei He
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Y
Yu-Quan Wang
Institute of Automation, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Yuntao Chen
Yuntao Chen
Miromind
agentic aimultimodal modelcomputer vision
Zhaoxiang Zhang
Zhaoxiang Zhang
Institute of Automation, Chinese Academy of Sciences
Computer VisionPattern RecognitionBiologically-inspired Learning
Tieniu Tan
Tieniu Tan
Institute of Automation, Chinese Academy of Sciences