Traffic Sign Recognition in Autonomous Driving: Dataset, Benchmark, and Field Experiment

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in traffic sign recognition for autonomous driving—namely cross-regional variation, long-tailed category distribution, and semantic ambiguity—by introducing TS-1M, a large-scale dataset comprising over one million images annotated across 454 standardized classes. The study further proposes the first fine-grained diagnostic benchmark tailored for real-world deployment, evaluating cross-regional generalization, rare-class detection, robustness under low-resolution conditions, and semantic comprehension. Through a unified assessment of classic supervised models, self-supervised pre-trained architectures, and multimodal vision-language models (VLMs), the research highlights the critical role of semantic alignment in enhancing model generalization. Experimental results demonstrate that VLMs substantially outperform purely visual models and effectively support map-level decision constraints in real-world road systems.

Technology Category

Application Category

📝 Abstract
Traffic Sign Recognition (TSR) is a core perception capability for autonomous driving, where robustness to cross-region variation, long-tailed categories, and semantic ambiguity is essential for reliable real-world deployment. Despite steady progress in recognition accuracy, existing traffic sign datasets and benchmarks offer limited diagnostic insight into how different modeling paradigms behave under these practical challenges. We present TS-1M, a large-scale and globally diverse traffic sign dataset comprising over one million real-world images across 454 standardized categories, together with a diagnostic benchmark designed to analyze model capability boundaries. Beyond standard train-test evaluation, we provide a suite of challenge-oriented settings, including cross-region recognition, rare-class identification, low-clarity robustness, and semantic text understanding, enabling systematic and fine-grained assessment of modern TSR models. Using TS-1M, we conduct a unified benchmark across three representative learning paradigms: classical supervised models, self-supervised pretrained models, and multimodal vision-language models (VLMs). Our analysis reveals consistent paradigm-dependent behaviors, showing that semantic alignment is a key factor for cross-region generalization and rare-category recognition, while purely visual models remain sensitive to appearance shift and data imbalance. Finally, we validate the practical relevance of TS-1M through real-scene autonomous driving experiments, where traffic sign recognition is integrated with semantic reasoning and spatial localization to support map-level decision constraints. Overall, TS-1M establishes a reference-level diagnostic benchmark for TSR and provides principled insights into robust and semantic-aware traffic sign perception. Project page: https://guoyangzhao.github.io/projects/ts1m.
Problem

Research questions and friction points this paper is trying to address.

Traffic Sign Recognition
cross-region variation
long-tailed categories
semantic ambiguity
diagnostic benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

traffic sign recognition
large-scale dataset
diagnostic benchmark
cross-region generalization
vision-language models
🔎 Similar Papers
No similar papers found.
G
Guoyang Zhao
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China
W
Weiqing Qi
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China
K
Kai Zhang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China
Chenguang Zhang
Chenguang Zhang
Graduate student, University of Pittsburgh
motor neurosciencebrain computer interface
Zeying Gong
Zeying Gong
The Hong Kong University of Science and Technology (Guangzhou)
ForecastingEmbodied AI
Zhihai Bi
Zhihai Bi
Fudan University; HKUST(GZ)
RoboticsLoco-ManipulationReinforcement Learning
Kai Chen
Kai Chen
Hong Kong University of Science and Technology
Representation LearningGenerative ModelingMulti-modalityMixture-of-Experts
B
Benshan Ma
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China
M
Ming Liu
Shenzhen Unity Drive Innovation Technology Co., Ltd., Shenzhen 518063, China
Jun Ma
Jun Ma
Assistant Professor, The Hong Kong University of Science and Technology
RoboticsAutonomous DrivingMotion Planning and ControlOptimization