Translating Images to Road Network: A Sequence-to-Sequence Perspective

📅 2024-02-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This work addresses the end-to-end generation of structured road networks from remote-sensing or vehicle-mounted imagery—a challenging task requiring joint modeling of Euclidean geometry (e.g., landmark coordinates) and non-Euclidean topology (e.g., connectivity). We propose RoadNet Sequence, a unified integer-sequence representation that jointly encodes both geometric and topological information in a single output. Methodologically, we design a hybrid autoregressive/non-autoregressive sequence modeling framework, introduce Topology-Inherited Training for topology-aware knowledge distillation, and incorporate open-source map priors (SD-Maps) to enhance geometric and topological consistency. Key components include a BEV encoder, a non-autoregressive Transformer, and a sequence decoder. Evaluated on nuScenes, our approach significantly outperforms state-of-the-art methods: it improves both road sign detection accuracy and robustness of connectivity inference, while maintaining high inference efficiency and structural fidelity.

Technology Category

Application Category

📝 Abstract
The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non-Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of auto-regressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success on both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing state-of-the-art alternatives.
Problem

Research questions and friction points this paper is trying to address.

Extracting road networks from images for HD maps
Merging Euclidean and non-Euclidean data structures effectively
Improving landmark detection and topology reasoning accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Projecting road data into integer series representation
Decoupling dependencies via mixed autoregressive non-autoregressive approach
Leveraging topology inheritance and prior map information
🔎 Similar Papers
No similar papers found.