UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformer-based models incur prohibitive O(n²) computational complexity when processing long sequences in autonomous driving, while multimodal and temporal fusion typically rely on hand-crafted, explicit modules. To address these limitations, we propose UniLION—the first unified architecture for autonomous driving that integrates linear-group RNNs. UniLION replaces self-attention with linear-complexity recurrent modeling, enabling native support for LiDAR point clouds, multi-view images, and sequential data without dedicated fusion modules. It jointly models multiple modalities and tasks—including 3D detection, tracking, occupancy prediction, BEV segmentation, motion forecasting, and end-to-end planning—within a single framework. The architecture supports flexible configurations (e.g., LiDAR-only, multimodal, or temporal fusion), eliminating the conventional attention-plus-explicit-fusion paradigm. On key benchmarks, UniLION achieves state-of-the-art or competitive performance while significantly improving efficiency for long-sequence processing and enhancing model generalization.

Technology Category

Application Category

📝 Abstract
Although transformers have demonstrated remarkable capabilities across various domains, their quadratic attention mechanisms introduce significant computational overhead when processing long-sequence data. In this paper, we present a unified autonomous driving model, UniLION, which efficiently handles large-scale LiDAR point clouds, high-resolution multi-view images, and even temporal sequences based on the linear group RNN operator (i.e., performs linear RNN for grouped features). Remarkably, UniLION serves as a single versatile architecture that can seamlessly support multiple specialized variants (i.e., LiDAR-only, temporal LiDAR, multi-modal, and multi-modal temporal fusion configurations) without requiring explicit temporal or multi-modal fusion modules. Moreover, UniLION consistently delivers competitive and even state-of-the-art performance across a wide range of core tasks, including 3D perception (e.g., 3D object detection, 3D object tracking, 3D occupancy prediction, BEV map segmentation), prediction (e.g., motion prediction), and planning (e.g., end-to-end planning). This unified paradigm naturally simplifies the design of multi-modal and multi-task autonomous driving systems while maintaining superior performance. Ultimately, we hope UniLION offers a fresh perspective on the development of 3D foundation models in autonomous driving. Code is available at https://github.com/happinesslz/UniLION
Problem

Research questions and friction points this paper is trying to address.

Efficiently handles large-scale LiDAR and multi-view images
Unifies autonomous driving tasks without explicit fusion modules
Maintains competitive performance across 3D perception and planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses linear group RNNs for efficient sequence processing
Unified architecture handles multi-modal data without fusion modules
Supports multiple autonomous driving tasks with competitive performance
🔎 Similar Papers
No similar papers found.
Z
Zhe Liu
School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), Wuhan, China
J
Jinghua Hou
School of Electronic Information and Communications, Huazhong University of Science and Technology (HUST), Wuhan, China
Xiaoqing Ye
Xiaoqing Ye
School of Computing and Artificial Intelligence,Southwest Jiaotong University
Granular Computing、Recommender System、Business Intelligence
J
Jingdong Wang
Baidu Inc., Beijing, China
Hengshuang Zhao
Hengshuang Zhao
The University of Hong Kong
Computer VisionMachine LearningArtificial Intelligence
Xiang Bai
Xiang Bai
Huazhong University of Science and Technology (HUST)
Computer VisionOCR