TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work addresses the challenges of LiDAR-based 3D pedestrian detection under occlusion and in complex scenes by proposing an efficient, real-time, pure-LiDAR detection method. The approach encodes 3D point clouds into a lightweight 2D tensor using a height-aware three-band bird’s-eye-view (BEV) representation and employs a single-stage network to jointly detect vehicles, pedestrians, and cyclists. Key innovations include a bidirectional high-resolution feature pyramid (P1–P4), a region attention mechanism, a rotation-aware IoU loss, and distribution focal learning, which collectively enhance robustness and accuracy under occlusion. Evaluated on the KITTI dataset, the method achieves pedestrian BEV AP scores of 58.7, 52.6, and 47.2 on the easy, moderate, and hard subsets respectively at 49 FPS, substantially outperforming Complex-YOLO and demonstrating strong potential for real-world deployment.

📝 Abstract

Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier LiDAR points during 3D reconstruction. On KITTI dataset, TriBand-BEV attains 58.7/52.6/47.2 pedestrian BEV AP(%) for easy, moderate, and hard at 49 FPS on a single consumer GPU, surpassing Complex-YOLO, with gains of +12.6%, +7.5%, and +3.1%. Qualitative scenes show stable detection under occlusion. The pipeline is compact and ready for real time robotic deployment. Our source code is publicly available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

3D pedestrian detection

LiDAR-only perception

real-time autonomous driving

vulnerable road users

BEV representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

TriBand-BEV

height-aware BEV

LiDAR-only 3D detection