LADY: Linear Attention for Autonomous Driving Efficiency without Transformers

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of modeling long spatiotemporal multimodal sequences for end-to-end autonomous driving on resource-constrained edge devices, this paper proposes the first fully linear-attention-driven generative model. Methodologically, it breaks the quadratic complexity bottleneck of conventional Transformers by introducing a novel lightweight linear cross-attention mechanism—enabling efficient cross-modal (camera/LiDAR) and cross-temporal interactions at linear computational complexity, overcoming the limitation of existing linear attention methods that support only self-attention. The model jointly performs multi-sensor feature alignment and end-to-end trajectory generation. Experimentally, it achieves state-of-the-art planning performance on NAVSIM and Bench2Drive, with inference complexity invariant to historical sequence length. Furthermore, it is successfully deployed on edge platforms, significantly reducing both computational cost and memory footprint compared to prior approaches.

Technology Category

Application Category

📝 Abstract
End-to-end paradigms have demonstrated great potential for autonomous driving. Additionally, most existing methods are built upon Transformer architectures. However, transformers incur a quadratic attention cost, limiting their ability to model long spatial and temporal sequences-particularly on resource-constrained edge platforms. As autonomous driving inherently demands efficient temporal modeling, this challenge severely limits their deployment and real-time performance. Recently, linear attention mechanisms have gained increasing attention due to their superior spatiotemporal complexity. However, existing linear attention architectures are limited to self-attention, lacking support for cross-modal and cross-temporal interactions-both crucial for autonomous driving. In this work, we propose LADY, the first fully linear attention-based generative model for end-to-end autonomous driving. LADY enables fusion of long-range temporal context at inference with constant computational and memory costs, regardless of the history length of camera and LiDAR features. Additionally, we introduce a lightweight linear cross-attention mechanism that enables effective cross-modal information exchange. Experiments on the NAVSIM and Bench2Drive benchmarks demonstrate that LADY achieves state-of-the-art performance with constant-time and memory complexity, offering improved planning performance and significantly reduced computational cost. Additionally, the model has been deployed and validated on edge devices, demonstrating its practicality in resource-limited scenarios.
Problem

Research questions and friction points this paper is trying to address.

Enables efficient long-range temporal modeling for autonomous driving
Introduces linear cross-attention for cross-modal information exchange
Reduces computational cost for deployment on resource-constrained platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear attention replaces transformers for efficiency
Constant computational cost regardless of history length
Lightweight cross-attention enables cross-modal information exchange
🔎 Similar Papers
No similar papers found.
Jihao Huang
Jihao Huang
College of Control Science and Engineering, Zhejiang University
multi-agentcontrol theory
X
Xi Xia
Udeer AI, Hangzhou, China
Z
Zhiyuan Li
Yuanshi Intelligence, Shenzhen, China
Tianle Liu
Tianle Liu
Ph.D. in statistics at Harvard University
applied probabilitystatistical inferencemachine learning
J
Jingke Wang
Udeer AI, Hangzhou, China
J
Junbo Chen
Udeer AI, Hangzhou, China
T
Tengju Ye
Udeer AI, Hangzhou, China