Flowception: Temporally Expansive Flow Matching for Video Generation

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing video generation methods suffer from error accumulation, high training overhead, and difficulty in jointly modeling video length and content. This paper proposes Flowception—a non-autoregressive, variable-length framework—that introduces a novel collaborative mechanism of frame insertion and flow matching: alternating discrete frame insertion with continuous frame denoising to learn efficient probabilistic trajectories, thereby unifying temporal modeling and computational efficiency. The method integrates temporal-expansion flow matching, discrete-continuous hybrid sampling, local attention optimization, and non-autoregressive trajectory modeling. Flowception significantly outperforms autoregressive and full-sequence flow-based baselines on FVD and VBench, reduces training FLOPs by 3×, and is the first to enable unified modeling for both image-to-video generation and video interpolation. It supports high-fidelity, arbitrary-length video generation and interpolation.

Technology Category

Application Category

📝 Abstract

We present Flowception, a novel non-autoregressive and variable-length video generation framework. Flowception learns a probability path that interleaves discrete frame insertions with continuous frame denoising. Compared to autoregressive methods, Flowception alleviates error accumulation/drift as the frame insertion mechanism during sampling serves as an efficient compression mechanism to handle long-term context. Compared to full-sequence flows, our method reduces FLOPs for training three-fold, while also being more amenable to local attention variants, and allowing to learn the length of videos jointly with their content. Quantitative experimental results show improved FVD and VBench metrics over autoregressive and full-sequence baselines, which is further validated with qualitative results. Finally, by learning to insert and denoise frames in a sequence, Flowception seamlessly integrates different tasks such as image-to-video generation and video interpolation.

Problem

Research questions and friction points this paper is trying to address.

Generates variable-length videos non-autoregressively

Reduces error accumulation and computational cost in video generation

Integrates image-to-video and video interpolation tasks seamlessly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-autoregressive variable-length video generation framework

Interleaves discrete frame insertion with continuous denoising

Reduces training FLOPs and jointly learns video length

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling

2024-10-08arXiv.orgCitations: 31

Generalizable Implicit Motion Modeling for Video Frame Interpolation

2024-07-11Neural Information Processing SystemsCitations: 0