WD-DETR: Wavelet Denoising-Enhanced Real-Time Object Detection Transformer for Robot Perception with Event Cameras

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Event cameras’ dense event representations are highly susceptible to accumulation noise, leading to elevated false-negative rates and limiting real-time robotic perception. To address this, we propose an end-to-end wavelet-Transformer hybrid detection framework: (1) we integrate a learnable wavelet transform into the backbone network for structured, adaptive denoising of event streams; (2) we design a Dynamic Reorganization Convolutional Block (DRCB) to enhance inference efficiency of the hybrid encoder; and (3) we combine the DETR decoding paradigm with TensorRT FP16 quantization for efficient deployment. Our method achieves state-of-the-art performance across three major benchmarks—DSEC, Gen1, and 1Mpx—while attaining real-time inference at 35 FPS on the NVIDIA Jetson Orin NX platform. It significantly improves both noise robustness and deployment efficiency without compromising detection accuracy.

Technology Category

Application Category

📝 Abstract

Previous studies on event camera sensing have demonstrated certain detection performance using dense event representations. However, the accumulated noise in such dense representations has received insufficient attention, which degrades the representation quality and increases the likelihood of missed detections. To address this challenge, we propose the Wavelet Denoising-enhanced DEtection TRansformer, i.e., WD-DETR network, for event cameras. In particular, a dense event representation is presented first, which enables real-time reconstruction of events as tensors. Then, a wavelet transform method is designed to filter noise in the event representations. Such a method is integrated into the backbone for feature extraction. The extracted features are subsequently fed into a transformer-based network for object prediction. To further reduce inference time, we incorporate the Dynamic Reorganization Convolution Block (DRCB) as a fusion module within the hybrid encoder. The proposed method has been evaluated on three event-based object detection datasets, i.e., DSEC, Gen1, and 1Mpx. The results demonstrate that WD-DETR outperforms tested state-of-the-art methods. Additionally, we implement our approach on a common onboard computer for robots, the NVIDIA Jetson Orin NX, achieving a high frame rate of approximately 35 FPS using TensorRT FP16, which is exceptionally well-suited for real-time perception of onboard robotic systems.

Problem

Research questions and friction points this paper is trying to address.

Reducing noise in dense event representations for event cameras

Enhancing real-time object detection with wavelet denoising

Optimizing inference speed for robotic perception systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Wavelet transform denoises dense event representations

Transformer network predicts objects from features

Dynamic Reorganization Convolution Block reduces inference time

🔎 Similar Papers

No similar papers found.