🤖 AI Summary
To address resource-constrained estimation of contraflow cycling (bicycles/e-scooters riding against traffic flow) in urban traffic surveillance, this paper proposes WWC-Predictor, a sparse sampling framework that eliminates computational redundancy inherent in conventional frame-wise dense tracking. The method jointly leverages bounding-box locations from object detection and self-supervised image orientation estimation, integrating temporal sparse sampling with regression-based prediction to enable instantaneous behavior classification at extremely low sampling rates. Evaluated on a 35-minute real-world video dataset, WWC-Predictor achieves a mean estimation error of only 1.475%, while GPU inference time is reduced to just 19.12% of that required by full-frame tracking methods. Its core contributions are threefold: (i) the first incorporation of image orientation perception into contraflow cycling proportion estimation; (ii) the design of a lightweight, high-accuracy, and deployment-ready sparse monitoring paradigm; and (iii) empirical validation of robust performance under severe sampling constraints.
📝 Abstract
In the field of transportation, it is of paramount importance to address and mitigate illegal actions committed by both motor and non-motor vehicles. Among those actions, wrong-way cycling (i.e., riding a bicycle or e-bike in the opposite direction of the designated traffic flow) poses significant risks to both cyclists and other road users. To this end, this paper formulates a problem of detecting wrong-way cycling ratios in CCTV videos. Specifically, we propose a sparse sampling method called WWC-Predictor to efficiently solve this problem, addressing the inefficiencies of direct tracking methods. Our approach leverages both detection-based information, which utilizes the information from bounding boxes, and orientation-based information, which provides insights into the image itself, to enhance instantaneous information capture capability. On our proposed benchmark dataset consisting of 35 minutes of video sequences and minute-level annotation, our method achieves an average error rate of a mere 1.475% while taking only 19.12% GPU time of straightforward tracking methods under the same detection model. This remarkable performance demonstrates the effectiveness of our approach in identifying and predicting instances of wrong-way cycling.