WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Event cameras’ dense event representations are highly susceptible to accumulation noise, leading to elevated false-negative rates and limiting real-time robotic perception. To address this, we propose an end-to-end wavelet-Transformer hybrid detection framework: (1) we integrate a learnable wavelet transform into the backbone network for structured, adaptive denoising of event streams; (2) we design a Dynamic Reorganization Convolutional Block (DRCB) to enhance inference efficiency of the hybrid encoder; and (3) we combine the DETR decoding paradigm with TensorRT FP16 quantization for efficient deployment. Our method achieves state-of-the-art performance across three major benchmarks—DSEC, Gen1, and 1Mpx—while attaining real-time inference at 35 FPS on the NVIDIA Jetson Orin NX platform. It significantly improves both noise robustness and deployment efficiency without compromising detection accuracy.

Technology Category

Application Category

📝 Abstract
Previous studies on event camera sensing have demonstrated certain detection performance using dense event representations. However, the accumulated noise in such dense representations has received insufficient attention, which degrades the representation quality and increases the likelihood of missed detections. To address this challenge, we propose the Wavelet Denoising-enhanced DEtection TRansformer, i.e., WD-DETR network, for event cameras. In particular, a dense event representation is presented first, which enables real-time reconstruction of events as tensors. Then, a wavelet transform method is designed to filter noise in the event representations. Such a method is integrated into the backbone for feature extraction. The extracted features are subsequently fed into a transformer-based network for object prediction. To further reduce inference time, we incorporate the Dynamic Reorganization Convolution Block (DRCB) as a fusion module within the hybrid encoder. The proposed method has been evaluated on three event-based object detection datasets, i.e., DSEC, Gen1, and 1Mpx. The results demonstrate that WD-DETR outperforms tested state-of-the-art methods. Additionally, we implement our approach on a common onboard computer for robots, the NVIDIA Jetson Orin NX, achieving a high frame rate of approximately 35 FPS using TensorRT FP16, which is exceptionally well-suited for real-time perception of onboard robotic systems.
Problem

Research questions and friction points this paper is trying to address.

Reducing noise in dense event representations for event cameras
Enhancing real-time object detection with wavelet denoising
Optimizing inference speed for robotic perception systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet transform denoises dense event representations
Transformer network predicts objects from features
Dynamic Reorganization Convolution Block reduces inference time
🔎 Similar Papers
No similar papers found.
Y
Yangjie Cui
School of Aero-nautic Science and Engineering, Beihang University, Bejing 100191, China
Boyang Gao
Boyang Gao
GR
Y
Yiwei Zhang
School of Aero-nautic Science and Engineering, Beihang University, Bejing 100191, China
X
Xin Dong
Hangzhou Innovation Institute of Beihang University, Yuhang District, Hangzhou 310023, China
J
Jinwu Xiang
Institute of Unmanned System, Beihang University, Beijing 100191, China
D
Daochun Li
School of Aero-nautic Science and Engineering, Beihang University, Bejing 100191, China
Zhan Tu
Zhan Tu
Professor, Beihang University
Unmanned systemsIntelligent perceptionCollaborative controlBio-inspired robots