TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

📅 2024-12-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the cross-modal (image–point cloud) localization challenge in traffic intersection scenarios, where large viewpoint discrepancies between roadside cameras and 3D reference maps severely degrade accuracy, this paper proposes TrafficLoc—a fully end-to-end neural network. Methodologically: (i) a geometry-guided attention loss is introduced to explicitly model geometric inconsistencies induced by viewpoint disparity; (ii) intra-group and inter-group contrastive learning is incorporated to enhance local feature discriminability; and (iii) dense feature alignment coupled with soft-argmax-based position regression enables sub-pixel-level precise localization. Evaluated on a newly constructed CARLA-based intersection simulation dataset, TrafficLoc achieves an 86% improvement in localization accuracy over state-of-the-art methods. Furthermore, it establishes new state-of-the-art performance on both KITTI and nuScenes real-world benchmarks for roadside and vehicle-mounted camera localization, demonstrating strong generalization across diverse sensor configurations and environments.

Technology Category

Application Category

📝 Abstract
We tackle the problem of localizing the traffic surveillance cameras in cooperative perception. To overcome the lack of large-scale real-world intersection datasets, we introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla. Moreover, we introduce a novel neural network, TrafficLoc, localizing traffic cameras within a 3D reference map. TrafficLoc employs a coarse-to-fine matching pipeline. For image-point cloud feature fusion, we propose a novel Geometry-guided Attention Loss to address cross-modal viewpoint inconsistencies. During coarse matching, we propose an Inter-Intra Contrastive Learning to achieve precise alignment while preserving distinctiveness among local intra-features within image patch-point group pairs. Besides, we introduce Dense Training Alignment with a soft-argmax operator to consider additional features when regressing the final position. Extensive experiments show that our TrafficLoc improves the localization accuracy over the state-of-the-art Image-to-point cloud registration methods by a large margin (up to 86%) on Carla Intersection and generalizes well to real-world data. TrafficLoc also achieves new SOTA performance on KITTI and NuScenes datasets, demonstrating strong localization ability across both in-vehicle and traffic cameras. Our project page is publicly available at https://tum-luk.github.io/projects/trafficloc/.
Problem

Research questions and friction points this paper is trying to address.

Localizing traffic cameras in 3D reference maps
Overcoming cross-modal matching challenges at intersections
Improving 2D-3D feature fusion and position regression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided Attention Loss for 2D-3D fusion
Inter-intra Contrastive Learning for feature separation
Dense Training Alignment with soft-argmax regression
🔎 Similar Papers
No similar papers found.