🤖 AI Summary
To address the high computational overhead and poor real-time performance of Automatic License Plate Recognition (ALPR) systems in video streams—largely caused by multi-frame dependency—this paper proposes a single-frame-driven efficient ALPR method. Our approach introduces (1) Visual Rhythm (VR) modeling to encode vehicle motion trajectories into spatiotemporal images, and (2) an Accumulated Line Analysis (ALA) algorithm that jointly performs license plate localization and character sequence extraction within a single frame. The framework integrates YOLOv5-based detection, a lightweight CNN for character recognition, and VR-ALA joint inference. Evaluated on real-world traffic video datasets, our method achieves competitive accuracy (>92%) compared to state-of-the-art multi-frame approaches, while improving processing speed by 3.1× and significantly reducing GPU memory consumption and end-to-end latency. To the best of our knowledge, this is the first end-to-end ALPR system operating at the single-frame and per-vehicle granularity, establishing a new paradigm for edge deployment.
📝 Abstract
Video-based Automatic License Plate Recognition (ALPR) involves extracting vehicle license plate text information from video captures. Traditional systems typically rely heavily on high-end computing resources and utilize multiple frames to recognize license plates, leading to increased computational overhead. In this paper, we propose two methods capable of efficiently extracting exactly one frame per vehicle and recognizing its license plate characters from this single image, thus significantly reducing computational demands. The first method uses Visual Rhythm (VR) to generate time-spatial images from videos, while the second employs Accumulative Line Analysis (ALA), a novel algorithm based on single-line video processing for real-time operation. Both methods leverage YOLO for license plate detection within the frame and a Convolutional Neural Network (CNN) for Optical Character Recognition (OCR) to extract textual information. Experiments on real videos demonstrate that the proposed methods achieve results comparable to traditional frame-by-frame approaches, with processing speeds three times faster.