Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenges of high computational cost and suboptimal fusion performance in multimodal integration of RGB frames and event streams, stemming from their heterogeneity and redundancy. To this end, the authors propose Hyper-FEOD, a novel framework that introduces sparse hypergraph-guided cross-modal fusion (S-HCF) to model high-order non-local dependencies. Additionally, it incorporates a fine-grained mixture-of-experts module (FG-MoE) tailored to object boundaries, textures, and background regions, enhanced by pixel-wise spatial gating, load-balancing loss, and zero-initialization strategies for semantically adaptive and efficient feature refinement. Evaluated on mainstream RGB-event benchmarks, Hyper-FEOD achieves state-of-the-art accuracy with a lightweight architecture, demonstrating both high performance and real-time deployability on edge devices.

Technology Category

Application Category

📝 Abstract

Integrating frame-based RGB cameras with event streams offers a promising solution for robust object detection under challenging dynamic conditions. However, the inherent heterogeneity and data redundancy of these modalities often lead to prohibitive computational overhead or suboptimal feature fusion. In this paper, we propose Hyper-FEOD, a high-performance and efficient detection framework, which synergistically optimizes multi-modal interaction through two core components. First, we introduce Sparse Hypergraph-enhanced Cross-Modal Fusion (S-HCF), which leverages the inherent sparsity of event streams to construct an event-guided activity map. By performing high-order hypergraph modeling exclusively on selected motion-critical sparse tokens, S-HCF captures complex non-local dependencies between RGB and event data while overcoming the traditional complexity bottlenecks of hypergraph computation. Second, we design a Fine-Grained Mixture of Experts (FG-MoE) Enhancement module to address the diverse semantic requirements of different image regions. This module employs specialized hypergraph experts tailored for object boundaries, internal textures, and backgrounds, utilizing a pixel-level spatial gating mechanism to adaptively route and enhance features. Combined with a load-balancing loss and zero-initialization strategy, FG-MoE ensures stable training and precise feature refinement without disrupting the pre-trained backbone's distribution. Experimental results on mainstream RGB-Event benchmarks demonstrate that Hyper-FEOD achieves a superior accuracy-efficiency trade-off, outperforming state-of-the-art methods while maintaining a lightweight footprint suitable for real-time edge deployment.

Problem

Research questions and friction points this paper is trying to address.

frame-event object detection

multi-modal fusion

data redundancy

heterogeneous modalities

efficient detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Hypergraph

Cross-Modal Fusion

Mixture of Experts