Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-density urban scenarios, 3D pedestrian perception remains challenging due to severe occlusion and clutter, while manual annotation of ground truth trajectories is prohibitively expensive—especially for tail-case pedestrians. Method: We introduce the first multi-view LiDAR-camera fusion multi-object tracking benchmark specifically designed for crowded pedestrians, coupled with an offline automatic annotation system that generates trajectory-level ground truth via cross-modal point cloud–image joint reconstruction. Our tracking-by-detection framework employs a density-aware and relation-aware high-resolution representation learning mechanism, jointly modeling pedestrian density distributions and interaction graphs from multi-view images and sparse LiDAR point clouds. Contribution/Results: Evaluated on our newly established benchmark, our method achieves significant improvements in 3D tracking accuracy; the automatic annotation pipeline accelerates labeling efficiency by over 3×. Both code and dataset will be publicly released.

Technology Category

Application Category

📝 Abstract
Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.
Problem

Research questions and friction points this paper is trying to address.

Improving 3D pedestrian tracking in crowded urban environments
Speeding up 3D ground truth generation for crowded scenes
Learning density-aware representations for small crowded objects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view LiDAR-camera benchmark for crowded pedestrians
Offboard auto-labeling system for trajectory reconstruction
Density-aware high-resolution representations for small objects
🔎 Similar Papers
No similar papers found.
S
Shichao Li
Department of Perception, Zhuoyu Technology, Shenzhen, China
P
Peiliang Li
Department of Perception, Zhuoyu Technology, Shenzhen, China
Qing Lian
Qing Lian
HKUST
Peng Yun
Peng Yun
Ph.D. in CSE, HKUST
3D PerceptionIncremental LearningBayesian Neural NetworksCloud Robotics
Xiaozhi Chen
Xiaozhi Chen
ZYT
Machine LearningComputer Vision