🤖 AI Summary
In high-density urban scenarios, 3D pedestrian perception remains challenging due to severe occlusion and clutter, while manual annotation of ground truth trajectories is prohibitively expensive—especially for tail-case pedestrians. Method: We introduce the first multi-view LiDAR-camera fusion multi-object tracking benchmark specifically designed for crowded pedestrians, coupled with an offline automatic annotation system that generates trajectory-level ground truth via cross-modal point cloud–image joint reconstruction. Our tracking-by-detection framework employs a density-aware and relation-aware high-resolution representation learning mechanism, jointly modeling pedestrian density distributions and interaction graphs from multi-view images and sparse LiDAR point clouds. Contribution/Results: Evaluated on our newly established benchmark, our method achieves significant improvements in 3D tracking accuracy; the automatic annotation pipeline accelerates labeling efficiency by over 3×. Both code and dataset will be publicly released.
📝 Abstract
Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.