RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical limitation of inaccurate velocity estimation in vision-only temporal 3D object detection, which severely constrains NuScenes detection performance—particularly the NuScenes Detection Score (NDS). To tackle this, we propose a velocity-optimized enhanced Rotary Position Encoding (Rotary PE) that explicitly incorporates motion priors and strengthens cross-frame feature alignment and temporal motion representation. We further design an end-to-end trainable temporal fusion module, tightly integrated with the StreamPETR architecture built upon a ViT-L backbone. Crucially, our method improves velocity prediction accuracy without requiring additional sensors or post-processing. Evaluated on the NuScenes test set, it achieves a new state-of-the-art NDS of 70.86%, demonstrating that refined temporal position modeling is pivotal for accurate motion estimation in vision-only 3D detection.

Technology Category

Application Category

📝 Abstract
This technical report introduces a targeted improvement to the StreamPETR framework, specifically aimed at enhancing velocity estimation, a critical factor influencing the overall NuScenes Detection Score. While StreamPETR exhibits strong 3D bounding box detection performance as reflected by its high mean Average Precision our analysis identified velocity estimation as a substantial bottleneck when evaluated on the NuScenes dataset. To overcome this limitation, we propose a customized positional embedding strategy tailored to enhance temporal modeling capabilities. Experimental evaluations conducted on the NuScenes test set demonstrate that our improved approach achieves a state-of-the-art NDS of 70.86% using the ViT-L backbone, setting a new benchmark for camera-only 3D object detection.
Problem

Research questions and friction points this paper is trying to address.

Enhancing velocity estimation in camera-only 3D detection
Improving temporal modeling with customized positional embedding
Achieving state-of-the-art NDS on NuScenes dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced Rotary Position Embedding integration
Customized positional embedding strategy
State-of-the-art camera-only 3D detection
🔎 Similar Papers
No similar papers found.
H
Hang Ji
Udeer.ai
T
Tao Ni
Udeer.ai
X
Xufeng Huang
Udeer.ai
T
Tao Luo
Udeer.ai
Xin Zhan
Xin Zhan
Machine Learning Engineer, Apple Inc.
Machine LearningComputer Architecture
J
Junbo Chen
Udeer.ai