ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

To address the insufficient perception-decision coordination, error accumulation in modular pipelines, and poor real-time performance of end-to-end models in complex urban autonomous driving scenarios, this paper proposes Mamba-BEV: an end-to-end, real-time decision-making framework integrating bird’s-eye-view (BEV) perception with deep reinforcement learning. Its core innovations include: (1) a spatiotemporal feature extraction network based on the Mamba architecture to efficiently model long-range spatiotemporal dependencies; (2) a semantic segmentation visualization mechanism to enhance model interpretability; and (3) a unified joint training paradigm for BEV representation learning and policy optimization. Evaluated on the CARLA simulator, Mamba-BEV achieves a 28.6% reduction in collision rate, a 21.4% improvement in trajectory tracking accuracy, and maintains real-time inference latency below 50 ms. These results demonstrate its effectiveness in enabling safe, interpretable, efficient, and robust autonomous driving in highly dynamic urban environments.

Technology Category

Application Category

📝 Abstract

Autonomous driving systems face significant challenges in perceiving complex environments and making real-time decisions. Traditional modular approaches, while offering interpretability, suffer from error propagation and coordination issues, whereas end-to-end learning systems can simplify the design but face computational bottlenecks. This paper presents a novel approach to autonomous driving using deep reinforcement learning (DRL) that integrates bird's-eye view (BEV) perception for enhanced real-time decision-making. We introduce the exttt{Mamba-BEV} model, an efficient spatio-temporal feature extraction network that combines BEV-based perception with the Mamba framework for temporal feature modeling. This integration allows the system to encode vehicle surroundings and road features in a unified coordinate system and accurately model long-range dependencies. Building on this, we propose the exttt{ME$^3$-BEV} framework, which utilizes the exttt{Mamba-BEV} model as a feature input for end-to-end DRL, achieving superior performance in dynamic urban driving scenarios. We further enhance the interpretability of the model by visualizing high-dimensional features through semantic segmentation, providing insight into the learned representations. Extensive experiments on the CARLA simulator demonstrate that exttt{ME$^3$-BEV} outperforms existing models across multiple metrics, including collision rate and trajectory accuracy, offering a promising solution for real-time autonomous driving.

Problem

Research questions and friction points this paper is trying to address.

Enhancing real-time decision-making in autonomous driving

Overcoming computational bottlenecks in end-to-end learning systems

Improving interpretability of high-dimensional feature representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates BEV perception with Mamba framework

Uses Mamba-BEV for spatio-temporal feature extraction

Enhances DRL with interpretable semantic segmentation

🔎 Similar Papers

No similar papers found.