Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse Events

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

Event camera data is sparse, noisy, and encodes only brightness changes, posing significant challenges for feature extraction. To address this, we propose a physics-inspired self-supervised pretraining framework. First, leveraging the event generation mechanism, we design a difference-guided masked modeling objective to reconstruct time-intensity difference maps. Second, we introduce a backbone-fixed feature transition mechanism to preserve structural information within disentangled representations. Third, region-focused contrastive learning is employed to enhance semantic discriminability. The framework operates entirely without manual annotations and significantly improves performance across downstream tasks—including object recognition, semantic segmentation, and optical flow estimation—outperforming state-of-the-art methods. It demonstrates strong robustness to noise and superior generalization capability. Code and datasets are publicly available.

Technology Category

Application Category

📝 Abstract

Event camera, a novel neuromorphic vision sensor, records data with high temporal resolution and wide dynamic range, offering new possibilities for accurate visual representation in challenging scenarios. However, event data is inherently sparse and noisy, mainly reflecting brightness changes, which complicates effective feature extraction. To address this, we propose a self-supervised pre-training framework to fully reveal latent information in event data, including edge information and texture cues. Our framework consists of three stages: Difference-guided Masked Modeling, inspired by the event physical sampling process, reconstructs temporal intensity difference maps to extract enhanced information from raw event data. Backbone-fixed Feature Transition contrasts event and image features without updating the backbone to preserve representations learned from masked modeling and stabilizing their effect on contrastive learning. Focus-aimed Contrastive Learning updates the entire model to improve semantic discrimination by focusing on high-value regions. Extensive experiments show our framework is robust and consistently outperforms state-of-the-art methods on various downstream tasks, including object recognition, semantic segmentation, and optical flow estimation. The code and dataset are available at https://github.com/BIT-Vision/EventPretrain.

Problem

Research questions and friction points this paper is trying to address.

Extracting features from noisy sparse event data

Revealing latent edge and texture information

Improving semantic discrimination in event-based vision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-inspired self-supervised pre-training framework

Difference-guided Masked Modeling for enhanced information

Focus-aimed Contrastive Learning for semantic discrimination

🔎 Similar Papers

No similar papers found.