π€ AI Summary
This work addresses the challenge of inaccurate crowd counting on railway platforms caused by dense occlusions, camera motion, and perspective distortion during train arrivals. To this end, the authors propose a real-time multi-object tracking framework that integrates detection, appearance features, and 3D physical motion constraints. The approach innovatively incorporates a pinhole-geometry-based physically consistent 3D motion prior into dynamic scene tracking. It combines a YOLOv11m detector, an EfficientNet-B0 appearance encoder, the DeepSORT framework, and a novel Phys-3D physics-constrained Kalman filter, augmented with a virtual counting strip mechanism. Evaluated on the MOT-RailwayPlatformCrowdHead dataset, the method achieves a counting error of only 2.97%, significantly outperforming existing approaches.
π Abstract
Accurate, real-time crowd counting on railway platforms is essential for safety and capacity management. We propose to use a single camera mounted in a train, scanning the platform while arriving. While hardware constraints are simple, counting remains challenging due to dense occlusions, camera motion, and perspective distortions during train arrivals. Most existing tracking-by-detection approaches assume static cameras or ignore physical consistency in motion modeling, leading to unreliable counting under dynamic conditions. We propose a physics-constrained tracking framework that unifies detection, appearance, and 3D motion reasoning in a real-time pipeline. Our approach integrates a transfer-learned YOLOv11m detector with EfficientNet-B0 appearance encoding within DeepSORT, while introducing a physics-constrained Kalman model (Phys-3D) that enforces physically plausible 3D motion dynamics through pinhole geometry. To address counting brittleness under occlusions, we implement a virtual counting band with persistence. On our platform benchmark, MOT-RailwayPlatformCrowdHead Dataset(MOT-RPCH), our method reduces counting error to 2.97%, demonstrating robust performance despite motion and occlusions. Our results show that incorporating first-principles geometry and motion priors enables reliable crowd counting in safety-critical transportation scenarios, facilitating effective train scheduling and platform safety management.