π€ AI Summary
This work addresses anomaly detection in digital map lane-rendering images. We propose a four-stage Transformer-based pipeline: data preprocessing β masked image modeling (MiM) self-supervised pretraining β label-smoothed cross-entropy fine-tuning β post-processing. To our knowledge, this is the first application of MiM to this task; it integrates Swin Transformer with uniform masking and introduces a task-specific end-to-end classification architecture and fine-tuning strategy tailored to the structural characteristics of map imagery. On the benchmark dataset, our method achieves 94.77% accuracy (+0.76%) and an AUC of 0.9743 (+0.0245), while reducing fine-tuning epochs from 280 to 41βyielding nearly 7Γ improvement in training efficiency. The core contributions are: (i) pioneering the adaptation of MiM for lane-rendering anomaly detection; and (ii) demonstrating substantial gains in both detection accuracy and training efficiency over prior approaches.
π Abstract
The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.