🤖 AI Summary
To address high false detection rates in traffic sign recognition (TSR) under adverse weather and low-quality data, as well as poor robustness in few-shot and occlusion scenarios, this paper proposes a human-in-the-loop inference framework. It dynamically injects real-time human verification signals into the YOLOv8 detection pipeline and integrates Video-LLaVA—a video-capable extension of Qwen-VL-2—to enable spatiotemporal semantic alignment and error attribution. Detection and visual-language understanding are jointly optimized via LoRA fine-tuning and attention-guided training. Evaluated on BDD100K-TS and TT100K-v2, the method achieves mAP@0.5 of 86.3% and 79.1%, reduces false detections by 37%, decreases human intervention frequency by 52%, and maintains end-to-end latency below 120 ms. This work introduces the first closed-loop “detection–understanding–feedback” mechanism for TSR, significantly enhancing accuracy and reliability in complex traffic environments—particularly for speed-limit sign recognition.