Temporal-Guided Visual Foundation Models for Event-Based Vision

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Event cameras excel in complex scenes but pose challenges for modeling asynchronous event streams; existing approaches struggle to efficiently leverage pre-trained vision foundation models (VFMs). This paper proposes Temporal-Guided VFM, the first framework enabling cross-modal transfer of VFMs to event-based vision. It integrates long-range temporal attention, dual spatiotemporal attention, and deep feature guidance to jointly encode semantic content and temporal dynamics. The framework takes event-to-video reconstructions as input and incorporates a Transformer-based VFM backbone augmented with a temporal context fusion module, fine-tuned on real-world event data. Evaluated on semantic segmentation, depth estimation, and object detection, it achieves +16%, +21%, and +16% improvements over prior methods, respectively, establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Event cameras offer unique advantages for vision tasks in challenging environments, yet processing asynchronous event streams remains an open challenge. While existing methods rely on specialized architectures or resource-intensive training, the potential of leveraging modern Visual Foundation Models (VFMs) pretrained on image data remains under-explored for event-based vision. To address this, we propose Temporal-Guided VFM (TGVFM), a novel framework that integrates VFMs with our temporal context fusion block seamlessly to bridge this gap. Our temporal block introduces three key components: (1) Long-Range Temporal Attention to model global temporal dependencies, (2) Dual Spatiotemporal Attention for multi-scale frame correlation, and (3) Deep Feature Guidance Mechanism to fuse semantic-temporal features. By retraining event-to-video models on real-world data and leveraging transformer-based VFMs, TGVFM preserves spatiotemporal dynamics while harnessing pretrained representations. Experiments demonstrate SoTA performance across semantic segmentation, depth estimation, and object detection, with improvements of 16%, 21%, and 16% over existing methods, respectively. Overall, this work unlocks the cross-modality potential of image-based VFMs for event-based vision with temporal reasoning. Code is available at https://github.com/XiaRho/TGVFM.
Problem

Research questions and friction points this paper is trying to address.

Processing asynchronous event streams for vision tasks in challenging environments
Leveraging pretrained Visual Foundation Models for event-based vision applications
Integrating temporal reasoning with visual representations for cross-modality learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Visual Foundation Models with temporal fusion block
Uses Long-Range Temporal Attention for global dependencies
Combines Dual Spatiotemporal Attention with Deep Feature Guidance
🔎 Similar Papers
No similar papers found.
Ruihao Xia
Ruihao Xia
East China University of Science and Technology
Event-based VisionDomain AdaptationSemantic Segmentation
J
Junhong Cai
Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
L
Luziwei Leng
ACSLab, Huawei Technologies Company Ltd., Shenzhen 518055, China
Liuyi Wang
Liuyi Wang
Tongji University
computer visionnatural language processingartificial intelligence
C
Chengju Liu
Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai 201210, China, and also with State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University, Shanghai 201210, China
R
Ran Cheng
Department of Data Science and Artificial Intelligence and the Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR 999077, China
Y
Yang Tang
Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
P
Pan Zhou
Singapore Management University, 188065, Singapore