🤖 AI Summary
To address the high annotation cost, LiDAR dependency, and temporal inconsistency in 3D traffic light and sign labeling for end-to-end autonomous driving navigation, this paper proposes a long-range (200 m) 3D annotation method relying solely on RGB images, 2D detection bounding boxes, and GNSS/INS data. Our approach integrates monocular geometric reasoning, multi-frame motion consistency constraints, and pose optimization to construct an end-to-end 3D bounding box generation pipeline. Without LiDAR, it achieves centimeter-level localization accuracy and temporally consistent annotations. The method significantly lowers hardware requirements and annotation costs: annotation error is <15 cm, and throughput improves by over 50× compared to conventional methods. Generated annotations are directly usable for training real-time 3D detection models. To our knowledge, this is the first solution enabling production-grade, long-range, high-accuracy, and temporally consistent 3D annotation under purely visual conditions.
📝 Abstract
3D detection of traffic management objects, such as traffic lights and road signs, is vital for self-driving cars, particularly for address-to-address navigation where vehicles encounter numerous intersections with these static objects. This paper introduces a novel method for automatically generating accurate and temporally consistent 3D bounding box annotations for traffic lights and signs, effective up to a range of 200 meters. These annotations are suitable for training real-time models used in self-driving cars, which need a large amount of training data. The proposed method relies only on RGB images with 2D bounding boxes of traffic management objects, which can be automatically obtained using an off-the-shelf image-space detector neural network, along with GNSS/INS data, eliminating the need for LiDAR point cloud data.