EIFNet: Leveraging Event-Image Fusion for Robust Semantic Segmentation

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the robustness deficiency in semantic segmentation caused by the inherent heterogeneity between event cameras (sparse, noisy) and RGB images (dense, semantically rich). To this end, we propose an end-to-end multimodal fusion network. Our key contributions are: (1) an adaptive event feature optimization module that integrates multi-scale activity modeling with spatial attention to enhance event representation reliability; and (2) a modality-adaptive recalibration mechanism coupled with a multi-head attention-gated fusion strategy to achieve cross-modal feature alignment and dynamic weighted integration. The proposed method achieves state-of-the-art performance on the DDD17-Semantic and DSEC-Semantic benchmarks, significantly improving both accuracy and stability of event-driven semantic segmentation under challenging conditions—including high-motion scenes and low-light environments. By enabling effective joint exploitation of asynchronous events and synchronous intensity frames, our approach establishes a new paradigm for high-dynamic-range, low-latency scene understanding.

Technology Category

Application Category

📝 Abstract
Event-based semantic segmentation explores the potential of event cameras, which offer high dynamic range and fine temporal resolution, to achieve robust scene understanding in challenging environments. Despite these advantages, the task remains difficult due to two main challenges: extracting reliable features from sparse and noisy event streams, and effectively fusing them with dense, semantically rich image data that differ in structure and representation. To address these issues, we propose EIFNet, a multi-modal fusion network that combines the strengths of both event and frame-based inputs. The network includes an Adaptive Event Feature Refinement Module (AEFRM), which improves event representations through multi-scale activity modeling and spatial attention. In addition, we introduce a Modality-Adaptive Recalibration Module (MARM) and a Multi-Head Attention Gated Fusion Module (MGFM), which align and integrate features across modalities using attention mechanisms and gated fusion strategies. Experiments on DDD17-Semantic and DSEC-Semantic datasets show that EIFNet achieves state-of-the-art performance, demonstrating its effectiveness in event-based semantic segmentation.
Problem

Research questions and friction points this paper is trying to address.

Extracting reliable features from sparse, noisy event streams
Fusing event and image data with differing structures effectively
Achieving robust semantic segmentation in challenging environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Event Feature Refinement Module
Modality-Adaptive Recalibration Module
Multi-Head Attention Gated Fusion
🔎 Similar Papers
No similar papers found.
Z
Zhijiang Li
Beijing University of Posts and Telecommunications, Beijing 100876, China
Haoran He
Haoran He
Hong Kong University of Science and Technology
machine learningreinforcement learning