PEOD: A Pixel-Aligned Event-RGB Benchmark for Object Detection under Challenging Conditions

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing Event-RGB fusion detection datasets suffer from insufficient coverage of challenging scenarios (e.g., low-light, overexposure, high-speed motion) and low spatial resolution (≤640×480), hindering fair evaluation of multimodal detectors under adverse conditions. To address this, we introduce the first large-scale, pixel-level spatiotemporally aligned high-resolution Event-RGB object detection benchmark—comprising 130+ sequences and 340K fine-grained manual bounding-box annotations, with 57% depicting extreme conditions. The benchmark supports event-only, RGB-only, and fused multimodal inputs, establishing a high-fidelity evaluation standard for multimodal detection. Extensive experiments reveal that state-of-the-art fusion methods exhibit significant performance degradation under illumination degradation, whereas event-only models demonstrate superior robustness—indicating that current fusion strategies lack adaptability to modality mismatch.

Technology Category

Application Category

📝 Abstract

Robust object detection for challenging scenarios increasingly relies on event cameras, yet existing Event-RGB datasets remain constrained by sparse coverage of extreme conditions and low spatial resolution (<= 640 x 480), which prevents comprehensive evaluation of detectors under challenging scenarios. To address these limitations, we propose PEOD, the first large-scale, pixel-aligned and high-resolution (1280 x 720) Event-RGB dataset for object detection under challenge conditions. PEOD contains 130+ spatiotemporal-aligned sequences and 340k manual bounding boxes, with 57% of data captured under low-light, overexposure, and high-speed motion. Furthermore, we benchmark 14 methods across three input configurations (Event-based, RGB-based, and Event-RGB fusion) on PEOD. On the full test set and normal subset, fusion-based models achieve the excellent performance. However, in illumination challenge subset, the top event-based model outperforms all fusion models, while fusion models still outperform their RGB-based counterparts, indicating limits of existing fusion methods when the frame modality is severely degraded. PEOD establishes a realistic, high-quality benchmark for multimodal perception and facilitates future research.

Problem

Research questions and friction points this paper is trying to address.

Existing Event-RGB datasets lack extreme condition coverage and high resolution

PEOD provides high-resolution aligned Event-RGB data for challenging detection scenarios

Current fusion methods struggle when frame modality is severely degraded

Innovation

Methods, ideas, or system contributions that make the work stand out.

High-resolution pixel-aligned Event-RGB dataset

Fusion-based models for multimodal object detection

Event-based models outperform under illumination challenges

🔎 Similar Papers

No similar papers found.

Authors to Follow