FastBEV++: Fast by Algorithm, Deployable by Design

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

BEV perception faces a fundamental trade-off between high accuracy and real-time deployment on embedded automotive platforms, primarily due to the high computational cost of view transformation and its reliance on custom CUDA kernels. To address this, we propose FastBEV++, the first framework to decouple view transformation into three standard, platform-agnostic operators—Index, Gather, and Reshape—eliminating dependency on vendor-specific kernels and enabling native, high-efficiency TensorRT deployment. FastBEV++ further integrates end-to-end deep sensor fusion and temporal feature aggregation, augmented by strong data augmentation strategies. Evaluated on nuScenes, it achieves a state-of-the-art 0.359 NDS while running at 134 FPS on a Tesla T4 GPU. This work bridges the gap between accuracy and latency, establishing a new paradigm for production-ready, vision-only BEV perception systems.

Technology Category

Application Category

📝 Abstract

The advancement of camera-only Bird's-Eye-View(BEV) perception is currently impeded by a fundamental tension between state-of-the-art performance and on-vehicle deployment tractability. This bottleneck stems from a deep-rooted dependency on computationally prohibitive view transformations and bespoke, platform-specific kernels. This paper introduces FastBEV++, a framework engineered to reconcile this tension, demonstrating that high performance and deployment efficiency can be achieved in unison via two guiding principles: Fast by Algorithm and Deployable by Design. We realize the "Deployable by Design" principle through a novel view transformation paradigm that decomposes the monolithic projection into a standard Index-Gather-Reshape pipeline. Enabled by a deterministic pre-sorting strategy, this transformation is executed entirely with elementary, operator native primitives (e.g Gather, Matrix Multiplication), which eliminates the need for specialized CUDA kernels and ensures fully TensorRT-native portability. Concurrently, our framework is "Fast by Algorithm", leveraging this decomposed structure to seamlessly integrate an end-to-end, depth-aware fusion mechanism. This jointly learned depth modulation, further bolstered by temporal aggregation and robust data augmentation, significantly enhances the geometric fidelity of the BEV representation.Empirical validation on the nuScenes benchmark corroborates the efficacy of our approach. FastBEV++ establishes a new state-of-the-art 0.359 NDS while maintaining exceptional real-time performance, exceeding 134 FPS on automotive-grade hardware (e.g Tesla T4). By offering a solution that is free of custom plugins yet highly accurate, FastBEV++ presents a mature and scalable design philosophy for production autonomous systems. The code is released at: https://github.com/ymlab/advanced-fastbev

Problem

Research questions and friction points this paper is trying to address.

Resolves tension between high performance and deployment efficiency in BEV perception.

Eliminates need for custom CUDA kernels with TensorRT-native view transformation.

Enhances BEV geometric fidelity via depth-aware fusion and temporal aggregation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes view transformation into standard Index-Gather-Reshape pipeline

Uses deterministic pre-sorting for TensorRT-native portability without custom kernels

Integrates end-to-end depth-aware fusion with temporal aggregation

🔎 Similar Papers

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

2024-08-05arXiv.orgCitations: 0

On Efficient Variants of Segment Anything Model: A Survey

2024-10-07arXiv.orgCitations: 7

Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal

2024-06-18arXiv.orgCitations: 6

Nvidia

base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5

US, CA, Santa Clara

Director, Perception - Autonomous Vehicles

Nvidia

320,000 USD - 488,750 USD

US, CA, Santa Clara

Authors to Follow