On the Feasibility and Opportunity of Autoregressive 3D Object Detection

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work proposes a novel approach to LiDAR-based 3D object detection by formulating it as an autoregressive sequence generation task, eliminating the need for hand-crafted anchor assignments and non-maximum suppression (NMS). The method decodes object parameters—including center location, size, orientation, velocity, and class—as discrete tokens in a causal, near-to-far order directly from point cloud features. By discarding anchors and NMS, the framework enables fully end-to-end training and seamlessly integrates with diverse point cloud backbone architectures. Furthermore, it opens new avenues for leveraging language modeling techniques—such as GRPO reinforcement learning—to optimize perceptual objectives. Evaluated on the nuScenes benchmark, the proposed approach achieves performance comparable to state-of-the-art methods, demonstrating the feasibility of anchor-free, NMS-free 3D detection.

Technology Category

Application Category

📝 Abstract

LiDAR-based 3D object detectors typically rely on proposal heads with hand-crafted components like anchor assignment and non-maximum suppression (NMS), complicating training and limiting extensibility. We present AutoReg3D, an autoregressive 3D detector that casts detection as sequence generation. Given point-cloud features, AutoReg3D emits objects in a range-causal (near-to-far) order and encodes each object as a short, discrete-token sequence consisting of its center, size, orientation, velocity, and class. This near-to-far ordering mirrors LiDAR geometry--near objects occlude far ones but not vice versa--enabling straightforward teacher forcing during training and autoregressive decoding at test time. AutoReg3D is compatible across diverse point-cloud or backbones and attains competitive nuScenes performance without anchors or NMS. Beyond parity, the sequential formulation unlocks language-model advances for 3D perception, including GRPO-style reinforcement learning for task-aligned objectives. These results position autoregressive decoding as a viable, flexible alternative for LiDAR-based detection and open a path to importing modern sequence-modeling tools into 3D perception.

Problem

Research questions and friction points this paper is trying to address.

3D object detection

LiDAR

autoregressive modeling

anchor-free

sequence generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive detection

sequence generation

LiDAR-based 3D object detection

anchor-free

tokenized object representation

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)