Occ-LLM: Enhancing Autonomous Driving with Occupancy-Based Large Language Models

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in autonomous driving—including joint modeling of dynamic objects and static scenes, low accuracy in 4D occupancy prediction, and weak multi-task coordination—this paper introduces Occupancy-LLM, the first large language model (LLM) for 4D occupancy perception in autonomous driving. Methodologically, we propose an occupancy-aware LLM architecture incorporating a motion-separation variational autoencoder (MS-VAE) that leverages prior knowledge to disentangle dynamic and static components, thereby mitigating class imbalance. Our framework unifies 4D occupancy forecasting, ego-vehicle trajectory planning, and natural-language-driven scene question answering. On benchmarks including nuScenes, our method achieves +6% IoU and +4% mIoU in 4D occupancy prediction over state-of-the-art methods. Moreover, it enables end-to-end planning and cross-modal scene understanding, demonstrating the effectiveness and generalizability of synergistic 4D occupancy representation and LLM-based reasoning.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have made substantial advancements in the field of robotic and autonomous driving. This study presents the first Occupancy-based Large Language Model (Occ-LLM), which represents a pioneering effort to integrate LLMs with an important representation. To effectively encode occupancy as input for the LLM and address the category imbalances associated with occupancy, we propose Motion Separation Variational Autoencoder (MS-VAE). This innovative approach utilizes prior knowledge to distinguish dynamic objects from static scenes before inputting them into a tailored Variational Autoencoder (VAE). This separation enhances the model's capacity to concentrate on dynamic trajectories while effectively reconstructing static scenes. The efficacy of Occ-LLM has been validated across key tasks, including 4D occupancy forecasting, self-ego planning, and occupancy-based scene question answering. Comprehensive evaluations demonstrate that Occ-LLM significantly surpasses existing state-of-the-art methodologies, achieving gains of about 6% in Intersection over Union (IoU) and 4% in mean Intersection over Union (mIoU) for the task of 4D occupancy forecasting. These findings highlight the transformative potential of Occ-LLM in reshaping current paradigms within robotic and autonomous driving.
Problem

Research questions and friction points this paper is trying to address.

Integrate LLMs with occupancy representation for autonomous driving
Address category imbalances in occupancy using MS-VAE
Enhance 4D forecasting and scene understanding in autonomous systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Occupancy-based Large Language Model
Motion Separation Variational Autoencoder
Dynamic and static scene separation
🔎 Similar Papers
No similar papers found.
Tianshuo Xu
Tianshuo Xu
The Hong Kong University of Science and Technology (Guang Zhou)
Diffusion ModelsAutonomous DrivingLow-Level Computer Vision
H
Hao Lu
Department of AI Thrust, Information Hub of Hong Kong University of Science and Technology (Guangzhou)
X
Xu Yan
Huawei Noah’s Ark Lab
Y
Yingjie Cai
Huawei Noah’s Ark Lab
Bingbing Liu
Bingbing Liu
Researcher, Huawei
Autonomous DrivingRoboticsNeural RenderingVision Foundation Model
Y
Yingcong Chen
Department of AI Thrust, Information Hub of Hong Kong University of Science and Technology (Guangzhou)