Seeing Fast and Slow: Learning the Flow of Time in Videos

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work proposes treating temporal flow rate as a learnable visual concept to enable perception and controllable generation of video playback speed. Through a self-supervised approach leveraging multimodal cues and temporal structure inherent in videos, the model accurately detects speed variations and estimates actual playback rates. Building upon this framework, the authors construct the largest wild slow-motion video dataset to date and demonstrate two key applications: synthesizing videos at user-specified playback speeds and performing temporal super-resolution on low-frame-rate videos to recover fine-grained dynamic details. This study is the first to model time flow rate as a manipulable perceptual dimension, significantly advancing capabilities in video temporal understanding and generation.

Technology Category

Application Category

📝 Abstract

How can we tell whether a video has been sped up or slowed down? How can we generate videos at different speeds? Although videos have been central to modern computer vision research, little attention has been paid to perceiving and controlling the passage of time. In this paper, we study time as a learnable visual concept and develop models for reasoning about and manipulating the flow of time in videos. We first exploit the multimodal cues and temporal structure naturally present in videos to learn, in a self-supervised manner, to detect speed changes and estimate playback speed. We then show that these learned temporal reasoning models enable us to curate the largest slow-motion video dataset to date from noisy in-the-wild sources. Such slow-motion footage, typically filmed by high-speed cameras, contains substantially richer temporal detail than standard videos. Using this data, we further develop models capable of temporal control, including speed-conditioned video generation, which produces motion at specified playback speed, and temporal super-resolution, which tranforms low-FPS, blurry videos into high-FPS sequences with fine-grained temporal details. Our findings highlight time as a manipulable, perceptual dimension in video learning, opening doors to temporally controllable video generation, temporal forensics detection, and potentially richer world-models that understand how events unfold over time.

Problem

Research questions and friction points this paper is trying to address.

video speed

temporal reasoning

slow-motion

time perception

temporal control

Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal reasoning

self-supervised learning

slow-motion video

temporal super-resolution

speed-conditioned generation

🔎 Similar Papers

No similar papers found.