Enhancing Steering Estimation with Semantic-Aware GNNs

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited steering estimation accuracy in autonomous driving caused by reliance solely on 2D images. We propose a semantic-aware hybrid graph neural network–recurrent neural network (GNN-RNN) method that estimates depth and semantic labels from monocular RGB images, reconstructs pseudo-3D point clouds, and introduces a semantic-guided sparse graph connectivity strategy—where 80% of edges connect nodes belonging to the same object class—to enable efficient spatiotemporal modeling. Crucially, our approach requires no ground-truth LiDAR data and achieves LiDAR-level performance using only monocular input: it improves over state-of-the-art pure 2D methods by 71% on KITTI and surpasses leading LiDAR-based baselines when evaluated on pseudo-3D inputs. The method delivers high accuracy, minimal hardware dependency (no LiDAR), and low computational overhead, establishing a novel paradigm for cost-effective, reliable steering estimation.

Technology Category

Application Category

📝 Abstract
Steering estimation is a critical task in autonomous driving, traditionally relying on 2D image-based models. In this work, we explore the advantages of incorporating 3D spatial information through hybrid architectures that combine 3D neural network models with recurrent neural networks (RNNs) for temporal modeling, using LiDAR-based point clouds as input. We systematically evaluate four hybrid 3D models, all of which outperform the 2D-only baseline, with the Graph Neural Network (GNN) - RNN model yielding the best results. To reduce reliance on LiDAR, we leverage a pretrained unified model to estimate depth from monocular images, reconstructing pseudo-3D point clouds. We then adapt the GNN-RNN model, originally designed for LiDAR-based point clouds, to work with these pseudo-3D representations, achieving comparable or even superior performance compared to the LiDAR-based model. Additionally, the unified model provides semantic labels for each point, enabling a more structured scene representation. To further optimize graph construction, we introduce an efficient connectivity strategy where connections are predominantly formed between points of the same semantic class, with only 20% of inter-class connections retained. This targeted approach reduces graph complexity and computational cost while preserving critical spatial relationships. Finally, we validate our approach on the KITTI dataset, achieving a 71% improvement over 2D-only models. Our findings highlight the advantages of 3D spatial information and efficient graph construction for steering estimation, while maintaining the cost-effectiveness of monocular images and avoiding the expense of LiDAR-based systems.
Problem

Research questions and friction points this paper is trying to address.

Improving steering estimation in autonomous driving using 3D spatial information
Reducing LiDAR reliance with pseudo-3D point clouds from monocular images
Optimizing graph construction via semantic-aware connectivity for efficient computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid 3D GNN-RNN model for LiDAR point clouds
Pseudo-3D point clouds from monocular depth estimation
Semantic-aware graph construction for efficiency
Fouad Makiyeh
Fouad Makiyeh
PhD in Computer Vision and Robotics
Computer VisionDeep LearningRoboticsApplied Mathematics.
H
Huy-Dung Nguyen
Hybrid Intelligence part of Capgemini Engineering
P
Patrick Chareyre
Hybrid Intelligence part of Capgemini Engineering
R
Ramin M. Hasani
Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology
Marc Blanchon
Marc Blanchon
Hybrid Intelligence - Capgemini
Deep LearningPolarimetryRoboticsComputer Vision
Daniela Rus
Daniela Rus
Andrew (1956) and Erna Viterbi Professor of Computer Science, MIT
RoboticsWireless NetworksDistributed Computing