New Spiking Architecture for Multi-Modal Decision-Making in Autonomous Vehicles

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

To address the high computational cost and poor deployability of Transformer-based multimodal models on edge devices for high-level autonomous driving decision-making, this paper proposes an end-to-end multimodal reinforcement learning framework tailored for real-time decision-making. The method introduces a Transformer-like architecture built upon ternary spiking neurons, enabling efficient fusion of heterogeneous inputs—including camera images, LiDAR point clouds, and vehicle pose data. It further incorporates spike-timing-aware mechanisms and a cross-attention module to preserve multimodal representation fidelity while drastically reducing computational complexity. Experimental evaluation on the Highway Environment benchmark demonstrates that the proposed approach achieves comparable or superior decision accuracy across multiple tasks, with a 42% reduction in inference latency and a 58% decrease in power consumption—thereby satisfying stringent real-time and energy-efficiency constraints of in-vehicle edge platforms.

Technology Category

Application Category

📝 Abstract

This work proposes an end-to-end multi-modal reinforcement learning framework for high-level decision-making in autonomous vehicles. The framework integrates heterogeneous sensory input, including camera images, LiDAR point clouds, and vehicle heading information, through a cross-attention transformer-based perception module. Although transformers have become the backbone of modern multi-modal architectures, their high computational cost limits their deployment in resource-constrained edge environments. To overcome this challenge, we propose a spiking temporal-aware transformer-like architecture that uses ternary spiking neurons for computationally efficient multi-modal fusion. Comprehensive evaluations across multiple tasks in the Highway Environment demonstrate the effectiveness and efficiency of the proposed approach for real-time autonomous decision-making.

Problem

Research questions and friction points this paper is trying to address.

Develops a spiking transformer for efficient multi-modal fusion in autonomous vehicles

Integrates camera, LiDAR, and heading data for real-time decision-making

Reduces computational cost for deployment in resource-constrained edge environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spiking transformer-like architecture for multi-modal fusion

Ternary spiking neurons enable computational efficiency

Cross-attention transformer integrates camera, LiDAR, heading data

🔎 Similar Papers

Passenger hazard perception based on EEG signals for highly automated driving vehicles