EvtSlowTV - A Large and Diverse Dataset for Event-Based Depth Estimation

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing event-based depth estimation methods suffer from limited generalizability due to reliance on small, manually annotated datasets. To address this, we introduce EvtSlowTV—the first large-scale, unconstrained event-based depth dataset constructed from publicly available YouTube videos—containing over 13 billion asynchronous events across diverse environments and dynamic motion scenarios. Departing from conventional frame-reconstruction paradigms, our approach fully exploits the high dynamic range and temporal precision of event streams, proposing an end-to-end self-supervised learning framework that directly regresses depth from raw events without any frame-level supervision or annotations. Extensive experiments demonstrate that models trained on EvtSlowTV achieve significantly improved generalization under complex motion and real-world conditions. This work establishes a foundational data resource and a principled methodological framework for practical deployment of event-driven monocular depth estimation.

Technology Category

Application Category

📝 Abstract

Event cameras, with their high dynamic range (HDR) and low latency, offer a promising alternative for robust depth estimation in challenging environments. However, many event-based depth estimation approaches are constrained by small-scale annotated datasets, limiting their generalizability to real-world scenarios. To bridge this gap, we introduce EvtSlowTV, a large-scale event camera dataset curated from publicly available YouTube footage, which contains more than 13B events across various environmental conditions and motions, including seasonal hiking, flying, scenic driving, and underwater exploration. EvtSlowTV is an order of magnitude larger than existing event datasets, providing an unconstrained, naturalistic setting for event-based depth learning. This work shows the suitability of EvtSlowTV for a self-supervised learning framework to capitalise on the HDR potential of raw event streams. We further demonstrate that training with EvtSlowTV enhances the model's ability to generalise to complex scenes and motions. Our approach removes the need for frame-based annotations and preserves the asynchronous nature of event data.

Problem

Research questions and friction points this paper is trying to address.

Addresses limited generalizability of event-based depth estimation methods

Provides large-scale dataset for robust depth learning in diverse environments

Enables self-supervised learning without frame-based annotations for events

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large dataset from YouTube for event-based depth

Self-supervised learning using raw event streams

No frame annotations and preserves event asynchronicity

🔎 Similar Papers

No similar papers found.

Authors to Follow