Robustness Is a Function, Not a Number: A Factorized Comprehensive Study of OOD Robustness in Vision-Based Driving

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the limitation of representing out-of-distribution (OOD) robustness in autonomous driving systems as a single scalar, which obscures specific failure mechanisms. The authors propose modeling robustness as a multidimensional function by factorizing driving environments along five axes—scene type, season, weather, time of day, and traffic participants—and systematically evaluating visual driving policies (fully connected, CNN, and ViT) under perturbations across these dimensions within the VISTA closed-loop simulation platform. Innovatively training lightweight ViT heads on frozen foundation model features, they find ViTs significantly outperform CNNs and fully connected networks. Foundation models maintain over 85% success rates even under triple perturbations, and targeted training combinations (e.g., rural + summer) or multi-environment data effectively enhance both overall and worst-case OOD robustness, offering actionable design principles for OOD-capable driving policies.

Technology Category

Application Category

📝 Abstract

Out of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), season, weather, time (day/night), and agent mix; and measure performance under controlled $k$-factor perturbations ($k \in \{0,1,2,3\}$). Using closed loop control in VISTA, we benchmark FC, CNN, and ViT policies, train compact ViT heads on frozen foundation-model (FM) features, and vary ID support in scale, diversity, and temporal context. (1) ViT policies are markedly more OOD-robust than comparably sized CNN/FC, and FM features yield state-of-the-art success at a latency cost. (2) Naive temporal inputs (multi-frame) do not beat the best single-frame baseline. (3) The largest single factor drops are rural $\rightarrow$ urban and day $\rightarrow$ night ($\sim 31\%$ each); actor swaps $\sim 10\%$, moderate rain $\sim 7\%$; season shifts can be drastic, and combining a time flip with other changes further degrades performance. (4) FM-feature policies stay above $85\%$ under three simultaneous changes; non-FM single-frame policies take a large first-shift hit, and all no-FM models fall below $50\%$ by three changes. (5) Interactions are non-additive: some pairings partially offset, whereas season-time combinations are especially harmful. (6) Training on winter/snow is most robust to single-factor shifts, while a rural+summer baseline gives the best overall OOD performance. (7) Scaling traces/views improves robustness ($+11.8$ points from $5$ to $14$ traces), yet targeted exposure to hard conditions can substitute for scale. (8) Using multiple ID environments broadens coverage and strengthens weak cases (urban OOD $60.6\% \rightarrow 70.1\%$) with a small ID drop; single-ID preserves peak performance but in a narrow domain. These results yield actionable design rules for OOD-robust driving policies.

Problem

Research questions and friction points this paper is trying to address.

OOD robustness

autonomous driving

distribution shift

environmental factors

vision-based driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

OOD robustness

factorized evaluation

foundation models