EvRainDrop: HyperGraph-guided Completion for Effective Frame and Event Stream Aggregation

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Event camera data exhibits spatial sparsity and temporal density, leading to severe undersampling in conventional frame- or voxel-based representations. To address this, we propose a hypergraph-guided spatiotemporal event stream completion framework: (1) a cross-spatiotemporal hypergraph is constructed to connect sparse event tokens, while RGB tokens are fused to enable multimodal collaborative completion; (2) a joint architecture integrating hypergraph neural networks and self-attention is designed to support dynamic message passing and multi-timestep feature aggregation. This work is the first to introduce hypergraph structures into event stream modeling, effectively mitigating spatial undersampling and enabling end-to-end joint completion and feature learning of RGB and event streams. Extensive experiments on both single-label and multi-label event classification tasks achieve state-of-the-art performance, demonstrating the efficacy of our approach in event completion and multimodal fusion.

Technology Category

Application Category

📝 Abstract
Event cameras produce asynchronous event streams that are spatially sparse yet temporally dense. Mainstream event representation learning algorithms typically use event frames, voxels, or tensors as input. Although these approaches have achieved notable progress, they struggle to address the undersampling problem caused by spatial sparsity. In this paper, we propose a novel hypergraph-guided spatio-temporal event stream completion mechanism, which connects event tokens across different times and spatial locations via hypergraphs and leverages contextual information message passing to complete these sparse events. The proposed method can flexibly incorporate RGB tokens as nodes in the hypergraph within this completion framework, enabling multi-modal hypergraph-based information completion. Subsequently, we aggregate hypergraph node information across different time steps through self-attention, enabling effective learning and fusion of multi-modal features. Extensive experiments on both single- and multi-label event classification tasks fully validated the effectiveness of our proposed framework. The source code of this paper will be released on https://github.com/Event-AHU/EvRainDrop.
Problem

Research questions and friction points this paper is trying to address.

Addresses spatial sparsity in event camera data streams
Completes sparse event streams using hypergraph-guided mechanisms
Enables multi-modal fusion between event and RGB data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypergraph-guided event stream completion mechanism
Multi-modal hypergraph information completion framework
Self-attention based temporal node aggregation
F
Futian Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
F
Fan Zhang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
X
Xiao Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
M
Mengqi Wang
School of Computer Science and Technology, Anhui University, Hefei 230601, China
Dexing Huang
Dexing Huang
Institute of Automation, Chinese Academy of Sciences
Medical Image ProcessingComputer VisionAIGCVLMs
Jin Tang
Jin Tang
Anhui University
Computer visionintelligent video analysis