NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes NoRD, a novel vision-language-action (VLA) driving model that achieves efficient end-to-end policy learning without relying on expensive reasoning annotations. Addressing the high training costs of existing VLA approaches—which depend on large-scale datasets and densely annotated reasoning traces—NoRD operates effectively on significantly smaller datasets (over 40% reduction in data volume). The method introduces Dr. GRPO, an optimized variant of the GRPO algorithm designed to mitigate optimization failure and difficulty bias commonly observed in small-data regimes. Evaluated on the Waymo and NAVSIM benchmarks, NoRD matches or approaches the performance of current state-of-the-art methods while substantially improving training efficiency and eliminating the need for costly reasoning labels.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with \modelname (\textbf{No} \textbf{R}easoning for \textbf{D}riving). Compared to existing VLAs, \modelname achieves competitive performance while being fine-tuned on $<$60\% of the data and no reasoning annotations, resulting in 3$\times$ fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. \modelname overcomes this by incorporating Dr.~GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, \modelname achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action
autonomous driving
data efficiency
reasoning annotations
difficulty bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action
data-efficient learning
reasoning-free training
Dr. GRPO
difficulty bias
🔎 Similar Papers
No similar papers found.