Learning better representations for crowded pedestrians in offboard LiDAR-camera 3D tracking-by-detection

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

In high-density urban scenarios, 3D pedestrian perception remains challenging due to severe occlusion and clutter, while manual annotation of ground truth trajectories is prohibitively expensive—especially for tail-case pedestrians. Method: We introduce the first multi-view LiDAR-camera fusion multi-object tracking benchmark specifically designed for crowded pedestrians, coupled with an offline automatic annotation system that generates trajectory-level ground truth via cross-modal point cloud–image joint reconstruction. Our tracking-by-detection framework employs a density-aware and relation-aware high-resolution representation learning mechanism, jointly modeling pedestrian density distributions and interaction graphs from multi-view images and sparse LiDAR point clouds. Contribution/Results: Evaluated on our newly established benchmark, our method achieves significant improvements in 3D tracking accuracy; the automatic annotation pipeline accelerates labeling efficiency by over 3×. Both code and dataset will be publicly released.

Technology Category

Application Category

📝 Abstract

Perceiving pedestrians in highly crowded urban environments is a difficult long-tail problem for learning-based autonomous perception. Speeding up 3D ground truth generation for such challenging scenes is performance-critical yet very challenging. The difficulties include the sparsity of the captured pedestrian point cloud and a lack of suitable benchmarks for a specific system design study. To tackle the challenges, we first collect a new multi-view LiDAR-camera 3D multiple-object-tracking benchmark of highly crowded pedestrians for in-depth analysis. We then build an offboard auto-labeling system that reconstructs pedestrian trajectories from LiDAR point cloud and multi-view images. To improve the generalization power for crowded scenes and the performance for small objects, we propose to learn high-resolution representations that are density-aware and relationship-aware. Extensive experiments validate that our approach significantly improves the 3D pedestrian tracking performance towards higher auto-labeling efficiency. The code will be publicly available at this HTTP URL.

Problem

Research questions and friction points this paper is trying to address.

Improving 3D pedestrian tracking in crowded urban environments

Speeding up 3D ground truth generation for crowded scenes

Learning density-aware representations for small crowded objects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view LiDAR-camera benchmark for crowded pedestrians

Offboard auto-labeling system for trajectory reconstruction

Density-aware high-resolution representations for small objects

🔎 Similar Papers

No similar papers found.