PillarTrack:Boosting Pillar Representation for Transformer-based 3D Single Object Tracking on Point Clouds

📅 2024-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
To address information redundancy or loss caused by point-wise sampling in LiDAR point cloud-based 3D single-object tracking, this paper proposes the first end-to-end tracking paradigm built upon pillar-based representation. Methodologically, we introduce (1) a Pyramid-Enhanced Pillar Feature Encoder (PE-PFE) to improve robustness against translation, rotation, and scale variations; (2) a modality-aware lightweight Transformer backbone for efficient cross-frame feature modeling; and (3) pillar grid-based representation—replacing point resampling—to preserve geometric fidelity. Extensive experiments on KITTI and nuScenes demonstrate state-of-the-art performance, significantly outperforming leading point-based baseline methods in both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
LiDAR-based 3D single object tracking (3D SOT) is a critical issue in robotics and autonomous driving. Existing 3D SOT methods typically adhere to a point-based processing pipeline, wherein the re-sampling operation invariably leads to either redundant or missing information, thereby impacting performance. To address these issues, we propose PillarTrack, a novel pillar-based 3D SOT framework. First, we transform sparse point clouds into dense pillars to preserve the local and global geometrics. Second, we propose a Pyramid-Encoded Pillar Feature Encoder (PE-PFE) design to enhance the robustness of pillar feature for translation/rotation/scale. Third, we present an efficient Transformer-based backbone from the perspective of modality differences. Finally, we construct our PillarTrack based on above designs. Extensive experiments show that our method achieves comparable performance on the KITTI and NuScenes datasets, significantly enhancing the performance of the baseline.
Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D single object tracking using pillar representation
Addressing redundant or missing information in point-based processing
Improving robustness to translation, rotation, and scale variations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms point clouds into dense pillars
Uses Pyramid-Encoded Pillar Feature Encoder
Implements efficient Transformer-based backbone
🔎 Similar Papers
No similar papers found.