3DTMDet: A Dual-Path Synergy Network of Transformer and SSM for 3D Object Detection in Point Clouds

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
This work addresses the inherent tension in 3D object detection from point clouds between the sparsity of distant points and the need for effective global context modeling. To this end, the authors propose 3DTMDet, a novel dual-path architecture that synergistically integrates a state space model (Mamba) with a local-attention Transformer. The core component, the 3D Hybrid Mamba Transformer (3DHMT) module, efficiently captures long-range dependencies among sparse, distant points while preserving fine-grained local geometric details. Furthermore, a LiDAR-aware voxel feature diffusion mechanism is introduced to enhance remote representations by propagating features along the sensor’s radial direction. Evaluated on the KITTI and ONCE benchmarks, the proposed method significantly outperforms current state-of-the-art approaches, demonstrating notable improvements in detecting distant and small objects.
📝 Abstract
A fundamental challenge in point cloud object detection lies in the conflict between the extreme sparsity of distant points and the need for remote context understanding. The existing methods typically use 1D serialization to expand the receptive field, which inevitably discards already scarce local geometric details and reduces detection of distant and small objects. To address this issue, we propose 3DTMDet, a novel detection network that synergistically combines state space models (Mamba) with Transformers. The core idea is to utilize SSM's linear complexity and advantages in long sequence modeling to effectively capture global interactions between sparse and distant points, while using Transformer modules with local attention to encode fine-grained geometric structures in local point sets, preserving accurate shape information. We propose the 3D Hybrid Mamba Transformer (3DHMT) block, which uses an SSM-Attention-SSM pipeline to balance global context understanding and local detail preservation, effectively alleviating the tension between receptive field enlargement and geometric preservation in remote detection. In addition, we introduced a voxel generation block inspired by LiDAR physics, which diffuses features along the sensor observation direction to reconstruct the complete object structure of occlusion and distant areas. Extensive experiments conducted on the KITTI and ONCE datasets have shown that 3DTMDet outperforms state-of-the-art detectors. The code is available at https://github.com/QiuBingwen/3DTMDet.
Problem

Research questions and friction points this paper is trying to address.

3D object detection
point clouds
sparsity
long-range context
geometric detail preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

State Space Model
Transformer
3D Object Detection
Point Cloud
Dual-Path Architecture
🔎 Similar Papers
No similar papers found.
B
Bingwen Qiu
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu, China
Y
Yuan Liu
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu, China
J
Junqi Bai
The 28th Research Institute of China Electronics Technology Group Corporation, Nanjing, 210007, Jiangsu, China
T
Tong Jiang
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu, China
Ben Liang
Ben Liang
Department of Electrical and Computer Engineering, University of Toronto
Networked SystemsWireless CommunicationsMobile ComputingMobility Management
F
Fangzhou Chen
College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, 210016, Jiangsu, China
Xiubao Sui
Xiubao Sui
Nanjing University of Science and Technology
infrared colorization
Q
Qian Chen
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu, China; School of Information and Communication Engineering, North University of China, Taiyuan, 030051, Shanxi, China