PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection under Challenging Conditions

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Event-RGB fusion detection datasets suffer from insufficient coverage of challenging scenarios (e.g., low-light, overexposure, high-speed motion) and low spatial resolution (≤640×480), hindering fair evaluation of multimodal detectors under adverse conditions. To address this, we introduce the first large-scale, pixel-level spatiotemporally aligned high-resolution Event-RGB object detection benchmark—comprising 130+ sequences and 340K fine-grained manual bounding-box annotations, with 57% depicting extreme conditions. The benchmark supports event-only, RGB-only, and fused multimodal inputs, establishing a high-fidelity evaluation standard for multimodal detection. Extensive experiments reveal that state-of-the-art fusion methods exhibit significant performance degradation under illumination degradation, whereas event-only models demonstrate superior robustness—indicating that current fusion strategies lack adaptability to modality mismatch.

Technology Category

Application Category

📝 Abstract
Robust object detection for challenging scenarios increasingly relies on event cameras, yet existing Event-RGB datasets remain constrained by sparse coverage of extreme conditions and low spatial resolution (<= 640 x 480), which prevents comprehensive evaluation of detectors under challenging scenarios. To address these limitations, we propose PEOD, the first large-scale, pixel-aligned and high-resolution (1280 x 720) Event-RGB dataset for object detection under challenge conditions. PEOD contains 130+ spatiotemporal-aligned sequences and 340k manual bounding boxes, with 57% of data captured under low-light, overexposure, and high-speed motion. Furthermore, we benchmark 14 methods across three input configurations (Event-based, RGB-based, and Event-RGB fusion) on PEOD. On the full test set and normal subset, fusion-based models achieve the excellent performance. However, in illumination challenge subset, the top event-based model outperforms all fusion models, while fusion models still outperform their RGB-based counterparts, indicating limits of existing fusion methods when the frame modality is severely degraded. PEOD establishes a realistic, high-quality benchmark for multimodal perception and facilitates future research.
Problem

Research questions and friction points this paper is trying to address.

Existing Event-RGB datasets lack extreme condition coverage and high resolution
PEOD provides high-resolution aligned Event-RGB data for challenging detection scenarios
Current fusion methods struggle when frame modality is severely degraded
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-resolution pixel-aligned Event-RGB dataset
Fusion-based models for multimodal object detection
Event-based models outperform under illumination challenges
🔎 Similar Papers
No similar papers found.
L
Luoping Cui
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
H
Hanqing Liu
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Mingjie Liu
Mingjie Liu
Assistant Professor, Department of Chemistry, University of Florida
computational materials scienceenergy conversion and storagemachine learningdata scienceAI-driven materials design
E
Endian Lin
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
D
Donghong Jiang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
Y
Yuhao Wang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
C
Chuang Zhu
School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China