🤖 AI Summary
This work addresses the trade-off between end-to-end latency and prediction accuracy in autonomous driving by proposing a multi-resolution input-capable convolutional neural network. The architecture incorporates resolution-specific batch normalization layers and a resolution redirection training strategy, enabling multi-scale training without revisiting original data. At inference time, the model dynamically selects the optimal input resolution based on available latency budgets. Experimental results on the CARLA urban driving benchmark demonstrate that, compared to fixed-resolution baselines, the proposed approach significantly reduces safety-critical incidents—including lane invasions, red-light violations, and collisions—thereby enhancing route-level driving safety.
📝 Abstract
Latency-accuracy tradeoffs are fundamental in real-time applications of deep neural networks (DNNs) for cyber-physical systems. In autonomous driving, in particular, safety depends on both prediction quality and the end-to-end delay from sensing to actuation. We observe that (1) when latency is accounted for, the latency-optimal network configuration varies with scene context and compute availability; and (2) a single fixed-resolution model becomes suboptimal as conditions change.
We present a multi-resolution, end-to-end deep neural network for the CARLA urban driving challenge using monocular camera input. Our approach employs a convolutional neural network (CNN) that supports multiple input resolutions through per-resolution batch normalization, enabling runtime selection of an ideal input scale under a latency budget, as well as resolution retargeting, which allows multi-resolution training without access to the original training dataset.
We implement and evaluate our multi-resolution end-to-end CNN in CARLA to explore the latency-safety frontier. Results show consistent improvements in per-route safety metrics - lane invasions, red-light infractions, and collisions - relative to fixed-resolution baselines.