TrafficLoc: Localizing Traffic Surveillance Cameras in 3D Scenes

📅 2024-12-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the cross-modal (image–point cloud) localization challenge in traffic intersection scenarios, where large viewpoint discrepancies between roadside cameras and 3D reference maps severely degrade accuracy, this paper proposes TrafficLoc—a fully end-to-end neural network. Methodologically: (i) a geometry-guided attention loss is introduced to explicitly model geometric inconsistencies induced by viewpoint disparity; (ii) intra-group and inter-group contrastive learning is incorporated to enhance local feature discriminability; and (iii) dense feature alignment coupled with soft-argmax-based position regression enables sub-pixel-level precise localization. Evaluated on a newly constructed CARLA-based intersection simulation dataset, TrafficLoc achieves an 86% improvement in localization accuracy over state-of-the-art methods. Furthermore, it establishes new state-of-the-art performance on both KITTI and nuScenes real-world benchmarks for roadside and vehicle-mounted camera localization, demonstrating strong generalization across diverse sensor configurations and environments.

Technology Category

Application Category

📝 Abstract

We tackle the problem of localizing the traffic surveillance cameras in cooperative perception. To overcome the lack of large-scale real-world intersection datasets, we introduce Carla Intersection, a new simulated dataset with 75 urban and rural intersections in Carla. Moreover, we introduce a novel neural network, TrafficLoc, localizing traffic cameras within a 3D reference map. TrafficLoc employs a coarse-to-fine matching pipeline. For image-point cloud feature fusion, we propose a novel Geometry-guided Attention Loss to address cross-modal viewpoint inconsistencies. During coarse matching, we propose an Inter-Intra Contrastive Learning to achieve precise alignment while preserving distinctiveness among local intra-features within image patch-point group pairs. Besides, we introduce Dense Training Alignment with a soft-argmax operator to consider additional features when regressing the final position. Extensive experiments show that our TrafficLoc improves the localization accuracy over the state-of-the-art Image-to-point cloud registration methods by a large margin (up to 86%) on Carla Intersection and generalizes well to real-world data. TrafficLoc also achieves new SOTA performance on KITTI and NuScenes datasets, demonstrating strong localization ability across both in-vehicle and traffic cameras. Our project page is publicly available at https://tum-luk.github.io/projects/trafficloc/.

Problem

Research questions and friction points this paper is trying to address.

Localizing traffic cameras in 3D reference maps

Overcoming cross-modal matching challenges at intersections

Improving 2D-3D feature fusion and position regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-guided Attention Loss for 2D-3D fusion

Inter-intra Contrastive Learning for feature separation

Dense Training Alignment with soft-argmax regression

🔎 Similar Papers

City-Scale Multi-Camera Vehicle Tracking System with Improved Self-Supervised Camera Link Model