🤖 AI Summary
This work addresses the end-to-end generation of structured road networks from remote-sensing or vehicle-mounted imagery—a challenging task requiring joint modeling of Euclidean geometry (e.g., landmark coordinates) and non-Euclidean topology (e.g., connectivity). We propose RoadNet Sequence, a unified integer-sequence representation that jointly encodes both geometric and topological information in a single output. Methodologically, we design a hybrid autoregressive/non-autoregressive sequence modeling framework, introduce Topology-Inherited Training for topology-aware knowledge distillation, and incorporate open-source map priors (SD-Maps) to enhance geometric and topological consistency. Key components include a BEV encoder, a non-autoregressive Transformer, and a sequence decoder. Evaluated on nuScenes, our approach significantly outperforms state-of-the-art methods: it improves both road sign detection accuracy and robustness of connectivity inference, while maintaining high inference efficiency and structural fidelity.
📝 Abstract
The extraction of road network is essential for the generation of high-definition maps since it enables the precise localization of road landmarks and their interconnections. However, generating road network poses a significant challenge due to the conflicting underlying combination of Euclidean (e.g., road landmarks location) and non-Euclidean (e.g., road topological connectivity) structures. Existing methods struggle to merge the two types of data domains effectively, but few of them address it properly. Instead, our work establishes a unified representation of both types of data domain by projecting both Euclidean and non-Euclidean data into an integer series called RoadNet Sequence. Further than modeling an auto-regressive sequence-to-sequence Transformer model to understand RoadNet Sequence, we decouple the dependency of RoadNet Sequence into a mixture of auto-regressive and non-autoregressive dependency. Building on this, our proposed non-autoregressive sequence-to-sequence approach leverages non-autoregressive dependencies while fixing the gap towards auto-regressive dependencies, resulting in success on both efficiency and accuracy. We further identify two main bottlenecks in the current RoadNetTransformer on a non-overfitting split of the dataset: poor landmark detection limited by the BEV Encoder and error propagation to topology reasoning. Therefore, we propose Topology-Inherited Training to inherit better topology knowledge into RoadNetTransformer. Additionally, we collect SD-Maps from open-source map datasets and use this prior information to significantly improve landmark detection and reachability. Extensive experiments on nuScenes dataset demonstrate the superiority of RoadNet Sequence representation and the non-autoregressive approach compared to existing state-of-the-art alternatives.