UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Transformer-based models incur prohibitive O(n²) computational complexity when processing long sequences in autonomous driving, while multimodal and temporal fusion typically rely on hand-crafted, explicit modules. To address these limitations, we propose UniLION—the first unified architecture for autonomous driving that integrates linear-group RNNs. UniLION replaces self-attention with linear-complexity recurrent modeling, enabling native support for LiDAR point clouds, multi-view images, and sequential data without dedicated fusion modules. It jointly models multiple modalities and tasks—including 3D detection, tracking, occupancy prediction, BEV segmentation, motion forecasting, and end-to-end planning—within a single framework. The architecture supports flexible configurations (e.g., LiDAR-only, multimodal, or temporal fusion), eliminating the conventional attention-plus-explicit-fusion paradigm. On key benchmarks, UniLION achieves state-of-the-art or competitive performance while significantly improving efficiency for long-sequence processing and enhancing model generalization.

Technology Category

Application Category

📝 Abstract

Although transformers have demonstrated remarkable capabilities across various domains, their quadratic attention mechanisms introduce significant computational overhead when processing long-sequence data. In this paper, we present a unified autonomous driving model, UniLION, which efficiently handles large-scale LiDAR point clouds, high-resolution multi-view images, and even temporal sequences based on the linear group RNN operator (i.e., performs linear RNN for grouped features). Remarkably, UniLION serves as a single versatile architecture that can seamlessly support multiple specialized variants (i.e., LiDAR-only, temporal LiDAR, multi-modal, and multi-modal temporal fusion configurations) without requiring explicit temporal or multi-modal fusion modules. Moreover, UniLION consistently delivers competitive and even state-of-the-art performance across a wide range of core tasks, including 3D perception (e.g., 3D object detection, 3D object tracking, 3D occupancy prediction, BEV map segmentation), prediction (e.g., motion prediction), and planning (e.g., end-to-end planning). This unified paradigm naturally simplifies the design of multi-modal and multi-task autonomous driving systems while maintaining superior performance. Ultimately, we hope UniLION offers a fresh perspective on the development of 3D foundation models in autonomous driving. Code is available at https://github.com/happinesslz/UniLION

Problem

Research questions and friction points this paper is trying to address.

Efficiently handles large-scale LiDAR and multi-view images

Unifies autonomous driving tasks without explicit fusion modules

Maintains competitive performance across 3D perception and planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses linear group RNNs for efficient sequence processing

Unified architecture handles multi-modal data without fusion modules

Supports multiple autonomous driving tasks with competitive performance

🔎 Similar Papers

No similar papers found.