π€ AI Summary
This paper addresses the challenge of evaluating robustness of traffic sign recognition models under adversarial attacks and distribution shifts. To this end, it introduces VISATβthe first vision-attribute-oriented benchmark dataset and evaluation framework. Methodologically, it proposes an attribute-driven multi-task learning (MTL) paradigm, integrating PGD-based adversarial training, ImageNet-C corruptions, and color quantization augmentation, and conducts systematic evaluations on ResNet-152 and ViT-B/32 backbones. Crucially, it pioneers the incorporation of semantic attributes into robustness assessment, uncovering spurious task correlations induced by adversarial perturbations and color manipulations. Experimental results show that MTL models exhibit stronger generalization under distribution shifts but suffer from cross-task interference; adversarial attacks substantially degrade attribute prediction consistency, exposing critical reliability vulnerabilities in current models.
π Abstract
We present VISAT, a novel open dataset and benchmarking suite for evaluating model robustness in the task of traffic sign recognition with the presence of visual attributes. Built upon the Mapillary Traffic Sign Dataset (MTSD), our dataset introduces two benchmarks that respectively emphasize robustness against adversarial attacks and distribution shifts. For our adversarial attack benchmark, we employ the state-of-the-art Projected Gradient Descent (PGD) method to generate adversarial inputs and evaluate their impact on popular models. Additionally, we investigate the effect of adversarial attacks on attribute-specific multi-task learning (MTL) networks, revealing spurious correlations among MTL tasks. The MTL networks leverage visual attributes (color, shape, symbol, and text) that we have created for each traffic sign in our dataset. For our distribution shift benchmark, we utilize ImageNet-C's realistic data corruption and natural variation techniques to perform evaluations on the robustness of both base and MTL models. Moreover, we further explore spurious correlations among MTL tasks through synthetic alterations of traffic sign colors using color quantization techniques. Our experiments focus on two major backbones, ResNet-152 and ViT-B/32, and compare the performance between base and MTL models. The VISAT dataset and benchmarking framework contribute to the understanding of model robustness for traffic sign recognition, shedding light on the challenges posed by adversarial attacks and distribution shifts. We believe this work will facilitate advancements in developing more robust models for real-world applications in autonomous driving and cyber-physical systems.