Vision-based Lifting of 2D Object Detections for Automated Driving

📅 2020-07-01
🏛️ Fusion
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the heavy reliance of 3D object detection in autonomous driving on expensive LiDAR sensors, this paper proposes a pure-vision monocular/biocular 2D-to-3D lifting method. The approach introduces an end-to-end geometrically constrained lifting pipeline that projects 2D detection bounding boxes into 3D space. Crucially, it incorporates, for the first time, a lightweight 2D CNN encoder that directly processes geometric point-cloud features associated with each 2D detection—achieving high accuracy while significantly improving computational efficiency. Furthermore, the method enables joint 3D modeling of all major road users, including vehicles, pedestrians, and cyclists. Evaluated on the KITTI benchmark, it achieves state-of-the-art (SOTA) performance among image-only methods at the time, with inference speed tripled (i.e., runtime reduced to one-third that of competing approaches). This work provides an efficient, cost-effective technical pathway toward vision-only autonomous driving.

Technology Category

Application Category

📝 Abstract
Image-based 3D object detection is an inevitable part of autonomous driving because cheap onboard cameras are already available in most modern cars. Because of the accurate depth information, currently most state-of-the-art 3D object detectors heavily rely on LiDAR data. In this paper, we propose a pipeline which lifts the results of existing vision-based 2D algorithms to 3D detections using only cameras as a cost-effective alternative to LiDAR. In contrast to existing approaches, we focus not only on cars but on all types of road users. To the best of our knowledge, we are the first using a 2D CNN to process the point cloud for each 2D detection to keep the computational effort as low as possible. Our evaluation on the challenging KITTI 3D object detection benchmark shows results comparable to state-of-the-art image-based approaches while having a runtime of only a third.
Problem

Research questions and friction points this paper is trying to address.

Lifting 2D object detections to 3D using only cameras
Providing a cost-effective alternative to LiDAR-based 3D detection
Focusing on all types of road users, not just cars
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lifts 2D to 3D detections using only cameras
Processes point cloud with 2D CNN for efficiency
Focuses on all road users, not just cars
🔎 Similar Papers
No similar papers found.
Hendrik Königshof
Hendrik Königshof
FZI Research Center for Information Technology | Karlsruhe Institute of Technology (KIT)
Environment PerceptionLocalizationPlanning
K
Kun Li
Intel. Systems and Prod. Engineering, FZI Research Center for Inf. Technology, Karlsruhe, Germany
C
C. Stiller
Institute of Meas. and Control Systems, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany