LADY: Linear Attention for Autonomous Driving Efficiency without Transformers

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

To address the challenge of modeling long spatiotemporal multimodal sequences for end-to-end autonomous driving on resource-constrained edge devices, this paper proposes the first fully linear-attention-driven generative model. Methodologically, it breaks the quadratic complexity bottleneck of conventional Transformers by introducing a novel lightweight linear cross-attention mechanism—enabling efficient cross-modal (camera/LiDAR) and cross-temporal interactions at linear computational complexity, overcoming the limitation of existing linear attention methods that support only self-attention. The model jointly performs multi-sensor feature alignment and end-to-end trajectory generation. Experimentally, it achieves state-of-the-art planning performance on NAVSIM and Bench2Drive, with inference complexity invariant to historical sequence length. Furthermore, it is successfully deployed on edge platforms, significantly reducing both computational cost and memory footprint compared to prior approaches.

Technology Category

Application Category

📝 Abstract

End-to-end paradigms have demonstrated great potential for autonomous driving. Additionally, most existing methods are built upon Transformer architectures. However, transformers incur a quadratic attention cost, limiting their ability to model long spatial and temporal sequences-particularly on resource-constrained edge platforms. As autonomous driving inherently demands efficient temporal modeling, this challenge severely limits their deployment and real-time performance. Recently, linear attention mechanisms have gained increasing attention due to their superior spatiotemporal complexity. However, existing linear attention architectures are limited to self-attention, lacking support for cross-modal and cross-temporal interactions-both crucial for autonomous driving. In this work, we propose LADY, the first fully linear attention-based generative model for end-to-end autonomous driving. LADY enables fusion of long-range temporal context at inference with constant computational and memory costs, regardless of the history length of camera and LiDAR features. Additionally, we introduce a lightweight linear cross-attention mechanism that enables effective cross-modal information exchange. Experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that LADY achieves state-of-the-art performance with constant-time and memory complexity, offering improved planning performance and significantly reduced computational cost. Additionally, the model has been deployed and validated on edge devices, demonstrating its practicality in resource-limited scenarios.

Problem

Research questions and friction points this paper is trying to address.

Enables efficient long-range temporal modeling for autonomous driving

Introduces linear cross-attention for cross-modal information exchange

Reduces computational cost for deployment on resource-constrained platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear attention replaces transformers for efficiency

Constant computational cost regardless of history length

Lightweight cross-attention enables cross-modal information exchange

🔎 Similar Papers

MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving