Driving on Registers

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of simultaneously achieving efficiency, accuracy, and controllability in end-to-end autonomous driving by proposing a lightweight architecture based on a pre-trained vision Transformer. The approach introduces a camera-aware register token mechanism to effectively compress multi-camera feature representations and employs two lightweight Transformer decoders to jointly generate candidate trajectories along with interpretable sub-scores—such as safety, comfort, and efficiency—enabling behavior-conditioned reasoning. Evaluated on the NAVSIM-v1, NAVSIM-v2, and HUGSIM closed-loop simulation benchmarks, the method matches or exceeds state-of-the-art performance, demonstrating that a pure Transformer-based solution can achieve high efficiency, accuracy, and adaptability in autonomous driving.

Technology Category

Application Category

📝 Abstract
We present DrivoR, a simple and efficient transformer-based architecture for end-to-end autonomous driving. Our approach builds on pretrained Vision Transformers (ViTs) and introduces camera-aware register tokens that compress multi-camera features into a compact scene representation, significantly reducing downstream computation without sacrificing accuracy. These tokens drive two lightweight transformer decoders that generate and then score candidate trajectories. The scoring decoder learns to mimic an oracle and predicts interpretable sub-scores representing aspects such as safety, comfort, and efficiency, enabling behavior-conditioned driving at inference. Despite its minimal design, DrivoR outperforms or matches strong contemporary baselines across NAVSIM-v1, NAVSIM-v2, and the photorealistic closed-loop HUGSIM benchmark. Our results show that a pure-transformer architecture, combined with targeted token compression, is sufficient for accurate, efficient, and adaptive end-to-end driving. Code and checkpoints will be made available via the project page.
Problem

Research questions and friction points this paper is trying to address.

autonomous driving
end-to-end learning
multi-camera perception
trajectory prediction
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

register tokens
vision transformer
end-to-end autonomous driving
trajectory scoring
token compression
🔎 Similar Papers
No similar papers found.